Course Abstract

Training duration: 90 min (Hands-on)

In this 90-minute course, Ankur Patel will explore one of the core concepts in unsupervised learning, clustering. Clustering is able to segment entities (e.g., users) into distinct and homogenous groups such that members of a group are very similar to members within the group but distinctly different from members in other groups. This group segmentation is possible without requiring any labels whatsoever and instead relies on separating entities based on behavior. For example, via clustering, online shoppers could be grouped into budget-conscious shoppers, high-end shoppers, frequent shoppers, seasonal shoppers, technophiles, audiophiles, sneakerheads, back-to-school shoppers, young parents, senior citizens, and millennials. To perform clustering well, good feature engineering is required. In this course, we will explore loan applications, perform feature engineering, and segment users based on their potential creditworthiness. We will also explore how clustering allows efficient labeling, turning unlabeled problems into labeled ones, opening up the realm of semi-supervised learning.

DIFFICULTY LEVEL: INTERMEDIATE

Instructor Bio:

Ankur Patel

Course Outline

Module 1: Introduction to Clustering

  • Why the need for clustering is exists / the real world motivation
  • How to find patterns in data with zero or few labels
  • How to efficiently label data when only few labels are available

Module 2: Overview of Clustering Algorithms

  • K-Means
  • Hierarchical clustering
  • DBSCAN
  • HDBSCAN
  • Apply to MNIST and Fashion MNIST datasets
  • Visualize clusters and evaluate results

Module 3: Application: Group Segmentation

  • Introduce use case: loan applications
  • Explore and prepare the data
  • Define evaluation function
  • Apply clustering algorithms and evaluate results

Background knowledge

  • Python coding experience

  • Familiarity with pandas, numpy, and scikit-learn

  • Understanding of basic machine learning concepts, including supervised learning

  • Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use-cases

  • Fraud Detection: Identify fraud in transactional data such as credit card, ACH, wire, and insurance claims

  • Anti-money Laundering: Detect potential money laundering for banks.

  • Cybersecurity: Stop malicious activity such as hacking

  • Machine Maintenance: Monitor sensor data to detect when machines are starting to malfunction

  • Disease Diagnosis: Spot potential disease using healthcare IoT sensor data.