Did You Miss the Live Session?
You can still view the live recording and all course materials by registering below.
Instructor: Brian Lucena
Training duration: 4 hours
Subscribe now and get 14-Day free trial
Sign-up for a Basic or Premium Plan and Get 10-35% Additional Discount Off Live Training
Instructor
Principal | Numeristical
Brian Lucena,PhD
By the end of the course, participants will be able to:
-
Explain how Gradient Boosting methods work as an ensemble of decision trees
-
Understand the various details and choices available for Gradient Boosting models
-
Know the various Gradient Boosting packages and the capabilities, strengths, and weaknesses of each
-
Know how to approach a data set, build a simple model, and then improve it
-
Know how to experiment with the parameters and use grid-search in a principled way
-
Know how to evaluate, interpret, calibrate, and explain your model to a lay audience
Course Abstract
This is an intensive course focused on using Gradient Boosting for classification and regression problems. This is a hands-on workshop with real data sets, where the participants will gain valuable experience training, evaluating, and drawing conclusions from Gradient Boosting models.
At the end of the course, participants will feel confident that they understand all details and parameters behind Gradient Boosting, and be able to present, criticize, and defend the models they create.
Course Schedule
Lesson 1: How Gradient Boosting works
We will review the concepts starting from the definition of the Decision Tree, through Random Forests to how Gradient Boosting can be seen as gradient descent where each step is a tree. Along the way, we will highlight the small details that make a big difference when it comes to configuring Gradient Boosting models.
Lesson 2: Review of Gradient Boosting Packages
We will present and use all the major Gradient Boosting implementations, including scikit-learn, XGBoost, CatBoost, LightGBM, and StructureBoost, demonstrating the relative strengths and weaknesses of each one.
Lesson 3: Training in Practice, Setting Parameters, Cross-Validation
We will work hands-on with several data sets and demonstrate best practices for exploring data and experimenting with model parameters. We will show the value of early stopping, and demonstrate how best to use it in a cross-validated setting. Along the way, we will write a grid search function from scratch that avoids many of the pitfalls in existing functions and demonstrate how to use it in a "targeted" fashion.
Lesson 4: Evaluating and Understanding the Model
After building a model, we will work through various ways to analyze and understand the model. We will review all the major metrics used, discuss when they are relevant or irrelevant, and learn how to put them in their proper context. This includes:
- Looking at ICE-plots to understand the practical impact of individual variables and if/how they interact with others
- Using the SHAP package and how to use those values to give meaningful "reasons" to a specific prediction.
- Examining the calibration of the model, understanding the consequences of poor calibration, and how to fix it.
By the end, you will be able to describe how the model works, which variables have the most impact, and in which scenarios one should be cautious applying the model.
Who will be interested in this course?
This course is geared to data scientists of all levels who wish to gain a deep understanding of Gradient Boosting and how to apply it to real-world situations. The ideal participant will have some experience with building models.
Which knowledge and skills you should have?
You should know the Python data science toolkit (numpy, pandas, scikit-learn, matplotlib) and have experience fitting models on training sets, making predictions on test sets, and evaluating the quality of the model with metrics.
Have questions?
What is included in your ticket?
-
Access to the on-demand recording
-
Certificate of completion