Highlight of the Week - Supervised Learning 3: Evaluation Metrics in Supervised Machine Learning
To access the full course, start your 7 days trial
Describe the elements of the confusion matrix
Describe metrics derived from the confusion matrix such as accuracy, precision, recall, and the f_beta score
Summarize what the ROC and precision-recall curves and AUC are
Review the logloss metric and its properties
Outline metrics often used in regression (MSE, RMSE, MAE, R2 score)
Calculate the value of each metric given a set’s target variable and predictions from an ML model
Calculate the baseline of each metric given a set’s target variable
Choose an appropriate evaluation metric given your ML problem
Module 1: Hard predictions in classification
Describe the difference between hard and soft predictions
Review the confusion matrix in binary and multiclass classification
Why cannot we use the confusion matrix directly to compare models?
Derive single number evaluation metrics based on the confusion matrix
Discuss the pros and cons of each metric and under what conditions they should be used
Calculate the baseline of each metric
Module 2: Working with predicted probabilities in classification
Predicted probabilities in sklearn
Review the logloss metric
The ROC and precision-recall curves and the AUC
Discuss the pros and cons of each metric and under what conditions they should be used
Calculate the baseline of each metric
Module 3: Regression metrics
MSE, RMSE and MAE
The R2 score and its properties
Discuss the pros and cons of each metric and under what conditions they should be used
Calculate/describe the baseline of each metric
Andras Zsom is a Lead Data Scientist in the Center for Computation and Visualization group at Brown University, Providence, RI. He works with high-level academic administrators to tackle predictive modeling problems, he collaborates with faculty members on data-intensive research projects, and he was the instructor of a data science course offered to the data science master students at Brown.
Python coding experience
Familiarity with pandas and numpy
Prior experience with scikit-learn and matplotlib are a plus but not required