Course Abstract

Training duration: 90 min (Hands-on)

In this 90-minute course, Ankur Patel will explore one of the core concepts in unsupervised learning, autoencoders, and introduce semi-supervised learning. Autoencoders are a shallow neural network that learn representations of the original input data and output the newly learned representations. In other words, autoencoders perform automatic feature engineering, limiting the need for manual feature engineering and accelerating the build of machine learning systems. Autoencoders are also a means to leverage information in a partially labeled dataset. With autoencoders, we are able to turn unsupervised machine learning problems into semi- supervised ones. In this course, we build unsupervised, supervised, and semi-supervised (using autoencoders) credit card fraud detection systems. First, we will employ a pure unsupervised approach, without the use of any labels. Next, we will employ a supervised approach on a partially labeled dataset. Finally, we will apply autoencoders to the partially labeled dataset (an unsupervised learning technique) and combine this with a supervised approach, building a semi-supervised solution. To conclude, we will compare and contrast the results of all three approaches.” We will also introduce deep unsupervised learning and explore one of the hottest areas of unsupervised learning today: generative modeling using GANs (short for generative adversarial networks). We will conclude with a demonstration of text and image-based GANs in action.

DIFFICULTY LEVEL: ADVANCED

Instructor Bio:

Ankur Patel

Co-founder and Head of Data | Glean

Ankur Patel

Ankur Patel is the co-founder and Head of Data at Glean. Glean uses NLP to extract data from invoices and generate vendor spend intelligence for clients. Ankur is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Previously, Ankur led teams at 7Park Data, ThetaRay, and R-Squared Macro and began his career at Bridgewater Associates and J.P. Morgan. He is a graduate of Princeton University and currently resides in New York City.

Course Outline

Module 1: Introduction to Deep Unsupervised Learning

  • Motivation for representation learning and refresher on neural networks 
  • Compare shallow vs. deep learning and deep learning vs. classical machine learning
  • Explore use cases of deep unsupervised learning today


Module 2: Semi-supervised Learning

  • Intro to automatic feature extraction and autoencoders, including a comparison of autoencoders to dimensionality reduction and an overview of complete, undercomplete, and overcomplete autoencoders
  • Intro to semi-supervised learning using autoencoders and how supervised and unsupervised learning complement each other
  • Develop semi-supervised fraud detection application using autoencoders
  • Compare the unsupervised, supervised, and semi-supervised solutions and evaluate results


Module 3: Generative Modeling

  • Intro to generative modeling, including restricted Boltzmann machines (RBMs), deep belief networks (DBNs), and generative adversarial networks (GANs)
  • Deep dive into GANs, including how a generator and a discriminator work together to produce synthetic data
  • Frame how generative modeling and GANs fit into the overall space of unsupervised learning
  • Demonstration of GANs in action using code to produce synthetic data

Background knowledge

  • Python coding experience

  • Familiarity with pandas, numpy, and scikit-learn

  • Understanding of basic machine learning concepts, including supervised learning

  • Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use-cases

  • Fraud Detection: Identify fraud in transactional data such as credit card, ACH, wire, and insurance claims

  • Anti-money Laundering: Detect potential money laundering for banks.

  • Cybersecurity: Stop malicious activity such as hacking

  • Machine Maintenance: Monitor sensor data to detect when machines are starting to malfunction

  • Disease Diagnosis: Spot potential disease using healthcare IoT sensor data.