Supervised Machine Learning 1: Introduction to machine learning and the bias-variance tradeoff
This course is available only as a part of subscription plans.
Training duration: 90 min (Hands-on)
Describe how a task like spam filtering can be solved with explicit coding instructions vs. a machine learning algorithm that learns from examples (training data)
Summarize the similarities and differences between supervised and unsupervised ML
List the pros and cons of supervised machine learning
Define the mathematical model behind linear and logistic regression
Explain what the loss function is
Describe the two main types of regularization and why it is important
Perform a simple train/validation/test split on IID data
Apply linear and logistic regression to datasets
Tune the regularization hyperparameter
Identify models with high bias and high variance and Select the best model and measure its performance on a previously unseen dataset, the test set
Andras Zsom, PhD
Andras Zsom, PhD
Lead Data Scientist and Adjunct Lecturer in Data Science | Brown University, Center for Computation and Visualization
Module 1: Intro to Machine Learning (20 minutes)
Motivation: why supervised ML is the most successful area of ML
The example of the spam filter: workflow with explicit coding instructions vs. machine learning
The feature matrix and the target variable
Supervised and unsupervised machine learning
The pros and cons of supervised ML
Automation and predictions
Module 2: Overview of linear and logistic regression with regularization (30 min)
The mathematical models behind linear and logistic regression
The cost function
Brief description of gradient descent
Motivate regularization
L1 (Lasso) and l2 (Ridge) regularization
Module 3: The bias-variance tradeoff (40 min)
Split a dataset into train/validation/test sets
Standardize the dataset
Train linear models with various regularization strength
Calculate the train and validation scores
Plot the scores and the predictions of corresponding models
Identify regions of high bias and high variance
Select the best regularization strength
Calculate the test score
Python coding experience
Familiarity with pandas and numpy
Prior experience with scikit-learn and matplotlib are a plus but not required