Learning Objectives

  • How to approach data exploration.

  • How to assess the ""coherence"" of a model

  • How to interpret complicated models (such as from Gradient Boosting or Random Forests)

  • How to ascribe reasons to individual predictions

Course Outline

Module1: Understanding the overall dynamics of your data and your model 

- Using sophisticated modeling packages (like XGBoost) to understand more complicated dynamics in the data 

- How to approach data exploration to understand more complicated relationships between the variables in your data 

- Why the "coherence" of a model is important - arguably, on the same level as its predictive performance 

- How to assess the "coherence" of a model using ICE plots 


Module 2: Understanding and explaining individual predictions from the model 

- How to ascribe "reasons" to individual predictions 

- How to "consolidate" features to make the reasons more coherent and understandable 

- Using visualizations independently and from the SHAP package

Instructor's Bio: Brian Lucena, PhD

Brian Lucena is a Principal at Lucena Consulting and a consulting Data Scientist at Agentero. An applied mathematician in every sense, he is passionate about applying modern machine learning techniques to understand the world and act upon it. In previous roles, he has served as SVP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp. In this course, we will work hands-on using XGBoost with real-world data sets to demonstrate how to approach data sets with the twin goals of prediction and understanding in a manner such that improvements in one area yield improvements in the other. Using modern tooling such as Individual Conditional Expectation (ICE) plots and SHAP, as well as a sense of curiosity, we will extract powerful insights that could not be gained from simpler methods. In particular, attention will be placed on how to approach a data set with the goal of understanding as well as prediction.

Who will be interested in this course?

  • Background in Python, Numpy, Pandas, Scikit-learn