Before you get started on the challenge, please read this first!
First things first:
- You get one attempt at the challenge!
- This is a timed challenge. The maximum time allowed once you begin is 130 minutes.
- This is an individual challenge. No teams are allowed. If our sleuths detect multiple IP addresses etc, sorry but your submission will be disqualified
The challenge tests the following skills:
- Data profiling
- Data wrangling
- Data modeling
- Data visualization
Languages you may use:
- Python 3
- Julia 1.1.1
- train.csv - data used for training along with target variable
- test.csv – data on which predictions are to be made
- sample_output.csv – sample format of submission
A well commented Jupyter notebook
The notebook should contain the solution, visualizations, and a discussion of the thought process, including the top features that go into the model. If required, please generate new features. Make appropriate plots, annotate the notebook with markdowns, and explain the necessary inferences. A person should be able to read the Notebook and understand the steps taken and the reasoning behind them. The solution will be graded on the basis of the usage of effective visualizations to convey the analysis and the modeling process.
The winner will be selected using the following qualifications:
- Complete the challenge within the allotted time
The metric used for evaluating the performance is Mean Absolute Error
MAE = Mean of absolute of differences between actuals and prediction.
Additionally, in the event of multiple top scores, the solution will be graded on the basis of the usage of effective visualizations to convey the analysis and the modeling process.
- Accuracy = Number of Correct Predictions/Total number of Predictions
- Chester Gan
- Khalil Henci