Get Ahead with Expert-Led Training in Unsupervised Machine Learning

In this 3-part course series, we will provide a foundational understanding of one of the major branches of machine learning: unsupervised learning. Most of the world’s data is unlabeled, and applying machine learning to this unlabeled data to solve real-world problems is one of the great challenges of artificial intelligence. 

We will show why unsupervised learning is so critical to working with data, especially if the data that is not only unlabeled but is very large scale and high volume. We will compare unsupervised learning with supervised learning and later combine the two approaches to develop semi-supervised learning solutions.

This course is an applied course, and we will use two simple, production-ready Python frameworks to develop unsupervised learning solutions: scikit-learn and TensorFlow. We will also use pandas, numpy, matplotlib, and other common data science packages.

Using unsupervised learning, we will discover meaningful patterns buried deep in data, patterns that may be near impossible for humans to find. We will use unsupervised learning to detect anomalies, perform group segmentation, develop recommender systems, and generate synthetic data such as text and images.

The course series focuses on topics such as dimensionality reduction (principal component analysis, singular value decomposition, random projection, isomap, multidimensional scaling, locally linear embedding, t-SNE, dictionary learning, and independent component analysis), clustering (k-means, hierarchical clustering, DBSCAN, and HDBSCAN), autoencoders, restricted Boltzmann machines, deep belief networks, generative adversarial networks, and time series clustering.

You can complete the courses in sequence or complete individual courses based on your interest.


Co-founder and Head of Data | Glean

Ankur Patel

Ankur Patel is the co-founder & Head of Data at Glean, an AI-powered spend intelligence solution for managing vendor spend, and the co-founder of Mellow, a fully managed machine learning platform for SMBs. He is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Prior to founding Glean and Mellow, Ankur led data science and machine learning teams at several startups including 7Park Data, ThetaRay, and R-Squared Macro and was the lead emerging markets trader at Bridgewater Associates. He is a graduate of Princeton University and currently resides in New York City.

Unsupervised Learning 1: Intro to Unsupervised Learning, Dimensionality Reduction, and Anomaly Detection

Module 1: Introduction to Unsupervised Learning

  • How unsupervised learning fits into the machine learning ecosystem
  • Common problems in machine learning: insufficient labeled data, curse of dimensionality, and outliers

Module 2: Introduction to Dimensionality Reduction

  • Motivation for dimensionality reduction: reduce computational complexity of large data, remove non-relevant information and surface salient information, perform anomaly detection, perform clustering
  • Linear Dimensionality Reduction Algos
  • Non-linear Dimensionality Reduction Algos

Module 3: Application: Anomaly Detection

  • Introduce use case: credit card fraud detection
  • Explore and prepare the data
  • Define evaluation function
  • Apply linear dimensionality reduction and evaluate results
  • Apply non-linear dimensionality reduction and evaluate results

Unsupervised Learning 2: Clustering and Group Segmentation

Module 1: Introduction to Clustering

  • Why the need for clustering is exists / the real world motivation
  • How to find patterns in data with zero or few labels
  • How to efficiently label data when only few labels are available

Module 2: Overview of Clustering Algorithms

  • K-Means
  • Hierarchical clustering
  • Apply to MNIST and Fashion MNIST datasets
  • Visualize clusters and evaluate results

Module 3: Application: Group Segmentation

  • Introduce use case: loan applications
  • Explore and prepare the data
  • Define evaluation function
  • Apply clustering algorithms and evaluate results

Unsupervised Learning 3: Semi-supervised Learning, Deep Unsupervised Learning, and Generative Models

Module 1: Introduction to Semi-Supervised Learning

  • Motivation for representation learning and refresher on neural networks and automatic feature engineering
  • Intro to semi-supervised learning and how supervised and unsupervised learning complement each other
  • Autoencoders and the variants (undercomplete vs. overcomplete autoencoders, dense vs. sparse autoencoders, denoising autoencoder, and variational autoencoder)

Module 2: Application: Semi-supervised Fraud Detection using Autoencoders

  • Introduce use case: credit card fraud detection
  • Explore and prepare the data
  • Define evaluation function
  • Build unsupervised learning fraud detection solution and evaluate results
  • Build supervised learning fraud detection solution and evaluate results
  • Build semi-supervised learning fraud detection solution and evaluate results
  • Compare and contrast results

Module 3: Deep Unsupervised Learning and Generative Models

  • Intro to deep unsupervised learning
  • Intro to generative modeling and synthetic data
  • GANs and the variants
  • Demonstration of GANs in action using code

Background knowledge

  • Python coding experience

  • Familiarity with pandas, numpy, and scikit-learn

  • Understanding of basic machine learning concepts, including supervised learning

  • Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use-cases

  • Fraud Detection: Identify fraud in transactional data such as credit card, ACH, wire, and insurance claims

  • Anti-money Laundering: Detect potential money laundering for banks

  • Cybersecurity: Stop malicious activity such as hacking

  • Machine Maintenance: Monitor sensor data to detect when machines are starting to malfunction

  • Disease Diagnosis: Spot potential disease using healthcare IoT sensor data