Machine Learning: Supervised and Unsupervised Learning
This course is available only as a part of subscription plans.
Supervised Machine Learning is a 6-part course series that walks through all steps of the classical supervised machine learning pipeline. We use python and packages like scikit-learn, pandas, numpy, and matplotlib. The course series focuses on topics like cross-validation and splitting strategies, evaluation metrics, supervised machine learning algorithms (like linear and logistic regression, support vector machines, and tree-based methods like the random forest, gradient boosting, and XGBoost), and interpretability.
Unsupervised Machine learning is a 3-part course series, we will provide a foundational understanding of one of the major branches of machine learning: unsupervised learning. Most of the world’s data is unlabeled, and applying machine learning to this unlabeled data to solve real-world problems is one of the great challenges of artificial intelligence.
We will show why unsupervised learning is so critical to working with data, especially if the data that is not only unlabeled but is very large scale and high volume. We will compare unsupervised learning with supervised learning and later combine the two approaches to develop semi-supervised learning solutions.
This course is an applied course, and we will use two simple, production-ready Python frameworks to develop unsupervised learning solutions: scikit-learn and TensorFlow. We will also use pandas, numpy, matplotlib, and other common data science packages.
Using unsupervised learning, we will discover meaningful patterns buried deep in data, patterns that may be near impossible for humans to find. We will use unsupervised learning to detect anomalies, perform group segmentation, develop recommender systems, and generate synthetic data such as text and images.
The course series focuses on topics such as dimensionality reduction (principal component analysis, singular value decomposition, random projection, isomap, multidimensional scaling, locally linear embedding, t-SNE, dictionary learning, and independent component analysis), clustering (k-means, hierarchical clustering, DBSCAN, and HDBSCAN), autoencoders, restricted Boltzmann machines, deep belief networks, generative adversarial networks, and time series clustering.
Andras Zsom, PhD
Ankur Patel
Supervised Learning 1: Introduction to Machine Learning and the Bias-Variance Tradeoff
Module 1: Intro to Machine Learning
Module 2: Overview of linear and logistic regression with regularization
Module 3: The bias-variance tradeoff
Supervised Learning 2: How to Prepare your Data for Supervised Machine Learning
Module 1: Split IID data (train/validation/test, KFoldCV, stratified splits in classification)
Module 2: Split non-IID data (GroupKFold, TimeSeriesSplit)
Module 3: Preprocess features (OneHotEncoder and OrdinalEncoder for categorical features, StandardScaler for continuous features)
Supervised Learning 3: Evaluation Metrics in Supervised Machine Learning
Module 1: Hard predictions in classification (the confusion matrix and derived metrics such as accuracy, precision, recall, f_beta score)
Module 2: Working with predicted probabilities in classification (ROC curve, precision-recall curve, AUC, the logloss metric)
Module 3: Regression metrics (MSE, RMSE, MAE, R2 score)
Supervised Learning 4: Non-linear Supervised Machine Learning Algorithms
Module 1: K-Nearest Neighbors
Module 2: Support Vector Machines (various kernels, hyperparameters, visualize predictions in simple cases with 1 or 2 features, pros and cons)
Module 3: Random Forests (CART, hyperparameters, visualize step-like predictions in simple cases with 1 or 2 features, pros and cons)
Module 4: XGBoost (hyperparameters, early stopping, missing values, pros and cons)
Supervised Learning 5: Missing Data in Supervised ML
Module 1: Missing Data Patterns
Module 2: Apply the Reduced-Features Model (also called the Pattern Submodel Approach)
Module 3: How to Determine the Patterns?
Module 4: Decide Which Approach is Best for Your Dataset
Supervised Learning 6: Interpretability
Module 1: Global features importances using the coefficients of linear models
Module 2: Permutation feature importance and algorithm-specific metrics (e.g., gini impurity, XGBoost metrics like weight, cover, gain)
Module 3: Local feature importance with SHAP values
Unsupervised Learning 1: Intro to Unsupervised Learning, Dimensionality Reduction, and Anomaly Detection
Module 1: Introduction to Unsupervised Learning
Module 2: Introduction to Dimensionality Reduction
Module 3: Application: Anomaly Detection
Unsupervised Learning 2: Clustering and Group Segmentation
Module 1: Introduction to Clustering
Module 2: Overview of Clustering Algorithms
Module 3: Application: Group Segmentation
Unsupervised Learning 3: Deep Unsupervised Learning, Semi-supervised Learning, and Generative Models
Module 1: Introduction to Deep Unsupervised Learning
Module 2: Semi-supervised Learning
Module 3: Generative Modeling
Python coding experience
Familiarity with pandas , numpy and scikit-learn
Prior experience with matplotlib are a plus but not required
Understanding of basic machine learning concepts, including supervised learning
Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus