Learning Objectives

  • Summarize how each algorithm works

  • Describe which hyperparameters need to be tuned and what range the values should have

  • Apply the algorithms in regression and classification

  • Visualize the predictions of toy datasets

  • Summarize under what circumstances a certain algorithm is expected to perform well or poorly and why

Course Outline

Module 1: KNN 

  • General overview of why we need to train multiple algorithms on the same dataset
  • Introduce the pros and cons summary table we will fill out on each algorithm
  • Describe how KNN works
  • Walk through the hyperparameters and what the range of the values should be
  • Apply it to toy datasets in regression and classification
  • Visualize the predictions to learn about the properties of the model
  • Summarize pros and cons


Module 2: SVM

  • Describe how SVM works, the focus is on radial basis functions
  • Walk through the hyperparameters and what the range of the values should be
  • Apply it to toy datasets in regression and classification
  • Visualize the predictions to learn about the properties of the model
  • Summarize pros and cons


Module 3: RF

  • Describe how RF works, start with CARTs
  • Walk through the hyperparameters and what the range of the values should be
  • Apply it to toy datasets in regression and classification
  • Visualize the predictions to learn about the properties of the model
  • Summarize pros and cons


Module 4: XGBoost

  • Describe how XGBoost works, contrast XGBoost to other tree-based models
  • Walk through the hyperparameters and what the range of the values should be
  • Apply it to toy datasets in regression and classification
  • Visualize the predictions to learn about the properties of the model
  • Summarize pros and cons

Instructor's Bio: Andras Zsom, PhD

Andras Zsom is a Lead Data Scientist in the Center for Computation and Visualization group at Brown University, Providence, RI. He works with high-level academic administrators to tackle predictive modeling problems, he collaborates with faculty members on data-intensive research projects, and he was the instructor of a data science course offered to the data science master students at Brown.

Who will be interested in this course?

  • Python coding experience

  • Familiarity with pandas and numpy

  • Prior experience with scikit-learn and matplotlib are a plus but not required