Non-linear Supervised Machine Learning Algorithms

Supervised Learning 4: Non-linear Supervised Machine Learning Algorithms

This course is available only as a part of subscription plans.

Course Abstract

Training duration: 90 min (Hands-on)

Supervised Learning is a course series that walks through all steps of the classical supervised machine learning pipeline. We use python and packages like scikit-learn, pandas, numpy, and matplotlib. The course series focuses on topics like cross validation and splitting strategies, evaluation metrics, supervised machine learning algorithms (like linear and logistic regression, support vector machines, and tree-based methods like random forest, gradient boosting, and XGBoost), and interpretability. You can complete the courses in sequence or complete individual courses based on your interest. We review four non-linear supervised machine learning algorithms in part 4 of the course series (K-Nearest Neighbors, Support Vector Machines, Random Forests, XGBoost). When you work on a project, generally you should try as many algorithms as you can on your dataset because it is difficult to know apriori which algorithm will perform best. Thus it is important to understand how various algorithms work, what hyperparameters need to be tuned, what the pros and cons of each algorithm are, etc. While we will not cover the in-depth math behind these algorithms as we did with linear and logistic regression in part 1, you will have a solid intuitive understanding of how the algorithms work upon completing this course. We will use a couple of toy datasets and visualizations I found helpful when learning about the properties of a new algorithm. As a result, you will be well-equipped to master other algorithms we do not cover here by yourself. I will also describe a couple of insights I gained about these algorithms over the years that might not be obvious to new users.

DIFFICULTY LEVEL: INTERMEDIATE

Learning Objectives

Summarize how each algorithm works
Describe which hyperparameters need to be tuned and what range the values should have
Apply the algorithms in regression and classification
Visualize the predictions of toy datasets
Summarize under what circumstances a certain algorithm is expected to perform well or poorly and why

Instructor Bio:

Andras Zsom, PhD

Lead Data Scientist and Adjunct Lecturer in Data Science | Brown University, Center for Computation and Visualization

Andras Zsom, PhD

Andras Zsom is a Lead Data Scientist in the Center for Computation and Visualization and an Adjunct Lecturer in Data Science at Brown University, Providence, RI, USA. He works with high-level academic administrators to tackle predictive modeling problems and to promote data-driven decision making, he collaborates with faculty members on data-intensive research projects, and he is the instructor of a mandatory course in the Data Science Master’s curriculum.

INTERESTED IN MORE HANDS-ON TRAINING SESSIONS?

VIEW PLANS >>

Course Outline

Module 1: KNN

General overview of why we need to train multiple algorithms on the same dataset
Introduce the pros and cons summary table we will fill out on each algorithm
Describe how KNN works
Walk through the hyperparameters and what the range of the values should be
Apply it to toy datasets in regression and classification
Visualize the predictions to learn about the properties of the model
Summarize pros and cons

Module 2: SVM

Describe how SVM works, the focus is on radial basis functions
Walk through the hyperparameters and what the range of the values should be
Apply it to toy datasets in regression and classification
Visualize the predictions to learn about the properties of the model
Summarize pros and cons

Module 3: RF

Describe how RF works, start with CARTs
Walk through the hyperparameters and what the range of the values should be
Apply it to toy datasets in regression and classification
Visualize the predictions to learn about the properties of the model
Summarize pros and cons

Module 4: XGBoost

Describe how XGBoost works, contrast XGBoost to other tree-based models
Walk through the hyperparameters and what the range of the values should be
Apply it to toy datasets in regression and classification
Visualize the predictions to learn about the properties of the model
Summarize pros and cons

Have questions?

GET IN TOUCH >>

Background knowledge

Python coding experience
Familiarity with pandas and numpy
Prior experience with scikit-learn and matplotlib are a plus but not required

Applicable Use-cases

The dataset can be expressed as a 2D feature matrix with the columns as features and the rows as data points
A continuous or categorical target variable exists
Some examples include but are not limited to fraud detection, predict if patients have a certain illness, predict the selling or rental price of properties, predict customer satisfaction

CHECK OUT NEW AND FEATURED COURSES

SEE ALL COURSES>>