Learning Objectives

  • Describe the elements of the confusion matrix

  • Describe metrics derived from the confusion matrix such as accuracy, precision, recall, and the f_beta score

  • Summarize what the ROC and precision-recall curves and AUC are

  • Review the logloss metric and its properties

  • Outline metrics often used in regression (MSE, RMSE, MAE, R2 score)

  • Calculate the value of each metric given a set’s target variable and predictions from an ML model

  • Calculate the baseline of each metric given a set’s target variable

  • Choose an appropriate evaluation metric given your ML problem

Course Outline

Module 1: Hard predictions in classification

  • Describe the difference between hard and soft predictions

  • Review the confusion matrix in binary and multiclass classification

  • Why cannot we use the confusion matrix directly to compare models?

  • Derive single number evaluation metrics based on the confusion matrix

  • Discuss the pros and cons of each metric and under what conditions they should be used

  • Calculate the baseline of each metric


Module 2: Working with predicted probabilities in classification

  • Predicted probabilities in sklearn

  • Review the logloss metric

  • The ROC and precision-recall curves and the AUC

  • Discuss the pros and cons of each metric and under what conditions they should be used

  • Calculate the baseline of each metric


Module 3: Regression metrics

  • MSE, RMSE and MAE

  • The R2 score and its properties

  • Discuss the pros and cons of each metric and under what conditions they should be used

  • Calculate/describe the baseline of each metric

Instructor's Bio: Andras Zsom, PhD

Andras Zsom is a Lead Data Scientist in the Center for Computation and Visualization group at Brown University, Providence, RI. He works with high-level academic administrators to tackle predictive modeling problems, he collaborates with faculty members on data-intensive research projects, and he was the instructor of a data science course offered to the data science master students at Brown.

Who will be interested in this course?

  • Python coding experience

  • Familiarity with pandas and numpy

  • Prior experience with scikit-learn and matplotlib are a plus but not required