Course Abstract

Training duration: 90 min (Hands-on)

In this 90-minute course, Ankur Patel will provide an introduction to NLP, reviewing its evolution over the last 70 years. We will explain why NLP matters today and how it powers many of the most popular applications we use every day. Then, we will dive into basic NLP tasks and perform them using one of the most popular open-source NLP libraries today: spaCy. We will also explore two other major open-source NLP libraries on the market: fast.ai and Hugging Face. Finally, we will explore state-of-the-art (SOTA) NLP, including attention mechanisms, transformers, pretrained language models, transfer learning, and fine-tuning. We will conclude with a hands-on application of state-of-the-art NLP, using fast.ai to develop a sentiment analysis model for IMDb movie reviews.

DIFFICULTY LEVEL: INTERMEDIATE

Instructor Bio:

Ankur Patel

Co-founder and Head of Data | Glean

Ankur Patel

Ankur Patel is the co-founder and Head of Data at Glean. Glean uses NLP to extract data from invoices and generate vendor spend intelligence for clients. Ankur is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Previously, Ankur led teams at 7Park Data, ThetaRay, and R-Squared Macro and began his career at Bridgewater Associates and J.P. Morgan. He is a graduate of Princeton University and currently resides in New York City.

Course Outline

Module 1: Introduction to NLP


  • What is NLP?
  • History of NLP
  • Motivation for NLP
  • Popular NLP applications today


Module 2: Basic NLP


  • Define basic NLP tasks
  • Introduce modern open-source NLP software libraries: spaCy, fast.ai, and Hugging Face
  • Perform basic NLP tasks using spaCy: tokenization, part-of-speech tagging, dependency parsing, chunking, lemmatization, and stemming


Module 3: State-of-the-Art (SOTA) NLP

  • Attention Mechanisms and Transformers
  • Pretrained Language Models, Transfer Learning, and Fine-tuning
  • Application: IMDb Movie Review Sentiment Analysis using fast.ai


Conclusion

Background knowledge

  • Python coding experience

  • Familiarity with pandas, numpy, and scikit-learn

  • Understanding of basic machine learning concepts, including supervised learning

  • Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use Cases

  • Semantic Search: search the entire web or a repository of documents and surface relevant search results (e.g., Google Search)

  • Natural Language Generation: auto-complete sentences as you write emails or draft documents (e.g., Gmail)

  • Machine Translation: convert text and audio from one language to another (e.g., Google Translate and Apple Translate)

  • Speech Recognition and Question Answering: give voice commands and control your home devices (e.g., Amazon Alexa, Apple Siri, Google Assistant, and Microsoft Cortana)

  • Customer Service Chatbots: ask account-related questions and get mostly reasonable answers (e.g., Intercom)