Course Abstract

Training duration: 90 min (Hands-on)

Supervised Learning is a course series that walks through all steps of the classical supervised machine learning pipeline. We use python and packages like scikit-learn, pandas, numpy, and matplotlib. The course series focuses on topics like cross validation and splitting strategies, evaluation metrics, supervised machine learning algorithms (like linear and logistic regression, support vector machines, and tree-based methods like random forest, gradient boosting, and XGBoost), and interpretability. You can complete the courses in sequence or complete individual courses based on your interest. A crucial part of the ML pipeline is to explain the predictions which is the focus of part 6 in the course series. It can be difficult to understand exactly how supervised machine learning models (especially non-linear ones) work which is why ML models are sometimes called black boxes. Such black boxes are fine if the only thing we care about is predictive accuracy. However often predictions are not enough and the model needs to provide explanations along with the predictions. E.g., if the model predicts that a patient has a certain disease, the doctor needs to be able to explain to the patient how the model made that diagnosis. We will start with global feature importance metrics which measure how strongly each feature contributes to the predictions generally. Then we will move on to local feature importance metrics which describe how much each feature contributes to the prediction of one data point specifically. I demonstrate all metrics on the same dataset to highlight that different metrics rank the features differently so model interpretability depends on the metric(s) you use.

DIFFICULTY LEVEL: INTERMEDIATE

Learning Objectives

  • Summarize why it is important to explain models

  • Describe why additional tools are necessary to explain non-linear models

  • Review the difference between global and local feature importance metrics

  • Use the coefficients of linear models to measure feature importance

  • Apply permutation feature importance to calculate global feature importances

  • Describe some model-specific approaches to measure global feature importance

  • Describe the intuition behind SHAP values

  • Create force, dependence, and summary plots to aid local interpretability

Instructor Bio:

Andras Zsom, PhD

Lead Data Scientist and Adjunct Lecturer in Data Science | Brown University, Center for Computation and Visualization

Andras Zsom, PhD

Andras Zsom is a Lead Data Scientist in the Center for Computation and Visualization and an Adjunct Lecturer in Data Science at Brown University, Providence, RI, USA. He works with high-level academic administrators to tackle predictive modeling problems and to promote data-driven decision making, he collaborates with faculty members on data-intensive research projects, and he is the instructor of a mandatory course in the Data Science Master’s curriculum.

Course Outline

Module 1: Global feature importance metrics in linear models

  • Describe model accuracy vs. interpretability
  • Linear models are easy to interpret but rarely the most accurate
  • Non-linear models are accurate but difficult to interpret for a human so additional tools are necessary to improve interpretability
  • The difference between global and local explanations
  • How to use the coefficients of linear models as a measure of global feature importance
  • Demonstrate why standardizing the features is important on a dataset


Module 2: Global feature importance metrics in non-linear models

  • Show how the permutation feature importance works
  • Describe the pros and cons of permutation feature importance
  • Global feature importance metric in sklearn’s Random Forest
  • Global feature importance in sklearn’s SVM with a linear kernel
  • Five ways to measure global feature importance with XGBoost
  • Review how the various metrics ranked the features


Module 3: Local feature importance metrics

  • Describe the Shapley values and the game theory background
  • How the Shapley values are applied in machine learning
  • The SHAP package
  • Calculate local feature importances
  • Create force, dependence, and summary plots

Background knowledge

  • Python coding experience

  • Familiarity with pandas and numpy

  • Prior experience with scikit-learn and matplotlib are a plus but not required

Applicable Use-cases

  • The dataset can be expressed as a 2D feature matrix with the columns as features and the rows as data points

  • A continuous or categorical target variable exists

  • Some examples include but are not limited to fraud detection, predict if patients have a certain illness, predict the selling or rental price of properties, predict customer satisfaction