Did You Miss the Live Session?

You can still view the live recording and all course materials by registering below.

Instructor: Brian Lucena

Training duration: 4 hours

Price with 20% discount

Regular Price: $210.00.

Subscribe now and get 14-Day free trial

Sign-up for a Basic or Premium Plan and Get 10-35% Additional Discount Off Live Training

Instructor

Principal | Numeristical

Brian Lucena,PhD

Brian Lucena is Principal at Numeristical and the creator of StructureBoost, ML-Insights, and SplineCalib. His mission is to enhance the understanding and application of modern machine learning and statistical techniques. He does this through academic research, open-source software development, and educational content such as live stream classes and interactive Jupyter notebooks. Additionally, he consults for organizations of all sizes from small startups to large public enterprises. In previous roles, he has served as SVP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.

By the end of the course, participants will be able to:

  • Explain how Gradient Boosting methods work as an ensemble of decision trees

  • Understand the various details and choices available for Gradient Boosting models

  • Know the various Gradient Boosting packages and the capabilities, strengths, and weaknesses of each

  • Know how to approach a data set, build a simple model, and then improve it

  • Know how to experiment with the parameters and use grid-search in a principled way

  • Know how to evaluate, interpret, calibrate, and explain your model to a lay audience

Course Abstract

This is an intensive course focused on using Gradient Boosting for classification and regression problems.  This is a hands-on workshop with real data sets, where the participants will gain valuable experience training, evaluating, and drawing conclusions from Gradient Boosting models.  

At the end of the course, participants will feel confident that they understand all details and parameters behind Gradient Boosting, and be able to present, criticize, and defend the models they create.

Course Schedule

Lesson 1: How Gradient Boosting works

We will review the concepts starting from the definition of the Decision Tree, through Random Forests to how Gradient Boosting can be seen as gradient descent where each step is a tree.  Along the way, we will highlight the small details that make a big difference when it comes to configuring Gradient Boosting models.

Lesson 2: Review of Gradient Boosting Packages

We will present and use all the major Gradient Boosting implementations, including scikit-learn, XGBoost, CatBoost, LightGBM, and StructureBoost, demonstrating the relative strengths and weaknesses of each one.

Lesson 3: Training in Practice, Setting Parameters, Cross-Validation

We will work hands-on with several data sets and demonstrate best practices for exploring data and experimenting with model parameters.  We will show the value of early stopping, and demonstrate how best to use it in a cross-validated setting.  Along the way, we will write a grid search function from scratch that avoids many of the pitfalls in existing functions and demonstrate how to use it in a "targeted" fashion.

Lesson 4: Evaluating and Understanding the Model

After building a model, we will work through various ways to analyze and understand the model.  We will review all the major metrics used, discuss when they are relevant or irrelevant, and learn how to put them in their proper context.  This includes:

- Looking at ICE-plots to understand the practical impact of individual variables and if/how they interact with others

- Using the SHAP package and how to use those values to give meaningful "reasons" to a specific prediction.

- Examining the calibration of the model, understanding the consequences of poor calibration, and how to fix it.

By the end, you will be able to describe how the model works, which variables have the most impact, and in which scenarios one should be cautious applying the model.


Who will be interested in this course?

This course is geared to data scientists of all levels who wish to gain a deep understanding of Gradient Boosting and how to apply it to real-world situations. The ideal participant will have some experience with building models.

Which knowledge and skills you should have?

You should know the Python data science toolkit (numpy, pandas, scikit-learn, matplotlib) and have experience fitting models on training sets, making predictions on test sets, and evaluating the quality of the model with metrics.

What is included in your ticket?

  • Access to the on-demand recording

  • Certificate of completion

Upcoming Live Training & Recordings

Access all live training