Supervised Learning is a course series that walks through all steps of the classical supervised machine learning pipeline. We use python and packages like scikit-learn, pandas, numpy, and matplotlib. The course series focuses on topics like cross validation and splitting strategies, evaluation metrics, supervised machine learning algorithms (like linear and logistic regression, support vector machines, and tree-based methods like random forest, gradient boosting, and XGBoost), and interpretability. You can complete the courses in sequence or complete individual courses based on your interest.
We review four non-linear supervised machine learning algorithms in part 4 of the course series (K-Nearest Neighbors, Support Vector Machines, Random Forests, XGBoost). When you work on a project, generally you should try as many algorithms as you can on your dataset because it is difficult to know apriori which algorithm will perform best. Thus it is important to understand how various algorithms work, what hyperparameters need to be tuned, what the pros and cons of each algorithm are, etc. While we will not cover the in-depth math behind these algorithms as we did with linear and logistic regression in part 1, you will have a solid intuitive understanding of how the algorithms work upon completing this course. We will use a couple of toy datasets and visualizations I found helpful when learning about the properties of a new algorithm. As a result, you will be well-equipped to master other algorithms we do not cover here by yourself. I will also describe a couple of insights I gained about these algorithms over the years that might not be obvious to new users.