Module 1: Background: Decision Trees, and Random Forests
- Decision Trees: Fitting Step functions
- Decision Trees vs Linear Regression
- Random Forests: Definition and Motivation
- Weaknesses of Random Forests
- Why Feature Engineering Matters
- Exercise: NBA Winner Prediction
Module 2: Gradient Boosting: Definition and History
- Boosting and Base Learners
- Boosting as Gradient Descent
- Role of the Loss Function
- Gradient Boosting vs Random Forest
- Gradient Boosting in Scikit-Learn
- Parameters of Gradient Boosting: which are most important?
- Intro to Parameter Tuning and Early Stopping
Module 3: Review of gradient Boosting Packages
- Example: Predicting House Prices
- Gradient Boosting with XGBoost
- Parameter Tuning: Grid Search
- Exercise: Write your own Grid Search
- Parameter Tuning: Bayesian Optimization
- LightGBM and CatBoost
- Handling of Missing Values
- Handling of Categorical Variables
- StructureBooost for Structured Categorical Variables
Module 4: Interpreting and Understanding Gradient Boosting Models
- Global vs Local Explanations
- What "Feature Importances" actually measure
- ICE-plots for Global Interpretations
- ICE-plots to Assess Model Quality
- SHAP and the Shapley Value
- Exploring interactivity
- Caveats to Interpreting Models
- Exercise: Explaining the House Prediction Model
Module 5: Application to Medical Data
- Exercise: You build the model!
- Data Exploration
- The Histogram Pair function
- Building a Model
- Tuning Parameters
- Evaluating Quantatively and Qualitatively
- Gaining Insights from a Model