DIFFICULTY LEVEL: INTERMEDIATE

Instructor Bio:

Ankur Patel

Co-founder and Head of Data | Glean

Ankur Patel

Ankur Patel is the co-founder and Head of Data at Glean. Glean uses NLP to extract data from invoices and generate vendor spend intelligence for clients. Ankur is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Previously, Ankur led teams at 7Park Data, ThetaRay, and R-Squared Macro and began his career at Bridgewater Associates and J.P. Morgan. He is a graduate of Princeton University and currently resides in New York City.

Course Outline

Module 1: Introduction to Unsupervised Learning

  • How unsupervised learning fits into the machine learning ecosystem
  • Common problems in machine learning: insufficient labeled data, curse of dimensionality, and outliers

Module 2: Introduction to Dimensionality Reduction

  • Motivation for dimensionality reduction: reduce computational complexity of large data, remove non-relevant information and surface salient information, perform anomaly detection, perform clustering
  • Linear Dimensionality Reduction Algos
  • Non-linear Dimensionality Reduction Algos

Module 3: Application: Anomaly Detection

  • Introduce use case: credit card fraud detection
  • Explore and prepare the data
  • Define evaluation function
  • Apply linear dimensionality reduction and evaluate results
  • Apply non-linear dimensionality reduction and evaluate results

Background knowledge

  • Python coding experience

  • Familiarity with pandas, numpy, and scikit-learn

  • Understanding of basic machine learning concepts, including supervised learning

  • Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use-cases

  • Fraud Detection: Identify fraud in transactional data such as credit card, ACH, wire, and insurance claims

  • Anti-money Laundering: Detect potential money laundering for banks.

  • Cybersecurity: Stop malicious activity such as hacking

  • Machine Maintenance: Monitor sensor data to detect when machines are starting to malfunction

  • Disease Diagnosis: Spot potential disease using healthcare IoT sensor data.