Live training with Brian Lucena starts on October 28th at 1 PM (ET)

Training duration: 4 hours (Hands-on)

Price with 10% discount

Regular Price: $210.00

Subscribe now and start 7-day free trial

Sign-up for Premium Plan and Get 10-35% Additional Discount Live Training

Instructor Bio:

Principal | Numeristical

Brian Lucena,PhD

Brian Lucena is Principal at Numeristical and the creator of StructureBoost, ML-Insights, and SplineCalib. His mission is to enhance the understanding and application of modern machine learning and statistical techniques. He does this through academic research, open-source software development, and educational content such as live stream classes and interactive Jupyter notebooks. Additionally, he consults for organizations of all sizes from small startups to large public enterprises. In previous roles, he has served as SVP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.

10% discount ends in:

  • 00 Days
  • 00 Hours
  • 00 Minutes
  • 00 Seconds

Learning Objectives

  • Understand in detail how Gradient Boosting models are fit as an ensemble of decision trees and apply that understanding to the feature engineering process.

  • Understand the various parameters of Gradient Boosting and their relative importances and how to appropriately choose them

  • Gain familiarity with the various Gradient Boosting packages and the capabilities, strengths, and weaknesses of each

  • Learn how to interpret, understand, and evaluate a model: both qualitatively and quantatively

DIFFICULTY LEVEL: INTERMEDIATE

Course Outline

Module 1: Background: Decision Trees, and Random Forests


- Decision Trees: Fitting Step functions

- Decision Trees vs Linear Regression

- Random Forests: Definition and Motivation

- Weaknesses of Random Forests

- Why Feature Engineering Matters

- Exercise: NBA Winner Prediction


Module 2: Gradient Boosting: Definition and History


- Boosting and Base Learners

- Boosting as Gradient Descent

- Role of the Loss Function

- Gradient Boosting vs Random Forest

- Gradient Boosting in Scikit-Learn

- Parameters of Gradient Boosting: which are most important?

- Intro to Parameter Tuning and Early Stopping


Module 3: Review of gradient Boosting Packages


- Example: Predicting House Prices

- Gradient Boosting with XGBoost

- Parameter Tuning: Grid Search

- Exercise: Write your own Grid Search

- Parameter Tuning: Bayesian Optimization

- LightGBM and CatBoost

- Handling of Missing Values

- Handling of Categorical Variables

- StructureBooost for Structured Categorical Variables


Module 4: Interpreting and Understanding Gradient Boosting Models


- Global vs Local Explanations

- What "Feature Importances" actually measure

- ICE-plots for Global Interpretations

- ICE-plots to Assess Model Quality

- SHAP and the Shapley Value

- Exploring interactivity

- Caveats to Interpreting Models

- Exercise: Explaining the House Prediction Model


Module 5: Application to Medical Data


- Exercise: You build the model!

- Data Exploration

- The Histogram Pair function

- Building a Model

- Tuning Parameters

- Evaluating Quantatively and Qualitatively

- Gaining Insights from a Model

Course Abstract

Gradient Boosting is widely used in prediction problems across industry and academia. Common applications include fraud detection, home price prediction, and loan default prediction, just to name a few. This course is an intensive hands-on workshop with real data sets focused on using Gradient Boosting for classification and regression problems. Participants will gain valuable experience training, evaluating, and drawing conclusions from Gradient Boosting models. They will gain familiarity with Gradient Boosting packages such as XGBoost, LightGBM, CatBoost, and StructureBoost. By the end of the course, participants will feel confident that they understand the details and parameters behind Gradient Boosting, and be able to present, criticize, and defend the models they create.

What background knowledge you should have?

  • This course is geared to data scientists of all levels who wish to gain a deep understanding of Gradient Boosting and how to apply it to real-world situations. The ideal participant will have some experience with building models. They should know the Python data science toolkit (numpy, pandas, scikit-learn, matplotlib) and have experience fitting models on training sets, making predictions on test sets, and evaluating the quality of the model with metrics.

What is included in your ticket?

  • Access to live training and QA session with the Instructor

  • Access to the on-demand recording

  • Certificate of completion

Upcoming Live Training & Recordings

Access all live training