Advanced Natural Language Processing Series

This series is only available as a part of the subscription plans

VIEW PLANS

Get Ahead with Expert-Led Training in Advanced Natural Language Processing

In this 3-part course series, we will provide a foundational understanding of one of the major branches of machine learning: natural language processing, also known as NLP. Historically, humans have used computers to crunch numbers but have relied on the human brain to analyze text and audio.

Now, machines are able to process text and audio in ways that most humans would have considered magical just two decades ago. NLP is a part of our everyday lives in the form of Google Search, Gmail Smart Compose, Google Translate, Amazon Alexa, Apple Siri, Google Assistant, Microsoft Cortana, and other applications. These commercial successes are the reason why NLP has exploded in popularity over the last few years.

We will cover the advances in NLP that have made these commercial successes possible. This course is an applied course, and we will use several modern mainstream NLP libraries to develop NLP applications; these libraries include spaCy, fast.ai, and Hugging Face. We will also use PyTorch, scikit-learn, pandas, numpy, and other common data science packages.

Using state-of-the-art NLP models such as BERT and GPT-3, we will solve NLP tasks such as named entity recognition and text classification. We will train NLP models with performance comparable or superior to out-of-the-box systems.

We will also cover modern NLP techniques such as transfer learning and fine-tuning and work with the transformer architecture, the most successful neural network architecture in NLP today. Along the way, we will cover some of the core components of the NLP pipeline such as tokenizers, word embeddings, and more.

Finally, we will cover the most important aspect of applied NLP: productionizing the models using automated pipelines, APIs, and web apps.

You can complete the courses in sequence or complete individual courses based on your interest.

Instructor

Co-founder and Head of Data | Glean

Ankur Patel

Ankur Patel is the co-founder and Head of Data at Glean. Glean uses NLP to extract data from invoices and generate vendor spend intelligence for clients. Ankur is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Previously, Ankur led teams at 7Park Data, ThetaRay, and R-Squared Macro and began his career at Bridgewater Associates and J.P. Morgan. He is a graduate of Princeton University and currently resides in New York City.

Advanced NLP 1: Overview of Basic to State-of-the-Art NLP

START COURSE

Module 1: Introduction to NLP

What is NLP?
History of NLP
Motivation for NLP
Popular NLP applications today

Module 2: Basic NLP

Define basic NLP tasks
Introduce modern open-source NLP software libraries: spaCy, fast.ai, and Hugging Face
Perform basic NLP tasks using spaCy: tokenization, part-of-speech tagging, dependency parsing, chunking, lemmatization, and stemming

Module 3: State-of-the-Art (SOTA) NLP

Attention Mechanisms and Transformers
Pretrained Language Models, Transfer Learning, and Fine-tuning
Application: IMDb Movie Review Sentiment Analysis using fast.ai

Conclusion

Advanced NLP 2: Modern NLP in Depth, from Theory to Action

START COURSE

Module 1: Modern NLP in theory

The path to NLP’s watershed “ImageNet” moment in 2018
Word embeddings: one-hot encoding, word2vec, GloVe, fastText, and context-aware pretrained word embeddings
Sequential models: vanilla recurrent neural networks (RNNs), long short-term memory (LSTMs), and gated recurrent units (GRUs)
Attention mechanisms and Transformers
ULMFiT, ELMo, BERT, BERTology, GPT-1, GPT-2, and GPT-3

Module 2: Modern NLP in action

Refresher on pre-trained language models, transfer learning, and fine-tuning
Introduction to common NLP tasks via Hugging Face: sequence classification, question answering, language modeling, text generation, named entity recognition, summarization, and translation

Module 3: Modern NLP applications

Explore dataset: AG news dataset
Application #1: Named Entity Recognition (NER)
Perform inference using out-of-the-box spaCy NER model
Annotate data using Prodigy
Develop custom named entity recognition model using spaCy
Compare custom NER model against the out-of-the-box spaCy NER model
Application #2: Text Classification
Annotate data using Prodigy
Develop text classification model using spaCy

Conclusion

Advanced NLP 3: NLP in Production via Web Apps, Automated Pipelines, and APIs

START COURSE

Module 1: Tools of the Trade

Deep Learning Frameworks: PyTorch vs. TensorFlow
Visualization and Experiment Tracking
AutoML and Data Platforms
ML Infrastructure and Compute
Edge / On-Device Inference
Cloud Inference and Machine Learning as a Service (MLaaS)
Continuous Integration and Delivery (CI/CD)

Module 2: NLP via Web Apps

Introduction to web apps and Streamlit
Build NLP web apps using Streamlit
Deploy and explore NLP web apps

Module 3: NLP at Scale via Automated Pipelines and APIs

The data team and the ML lifecyle: data scientists, engineers, and analysts; prototyping, deploying, and maintenance; notebooks and scripts
Introduction to Spark and Databricks
Compare speed of NER inference using Databricks vs. Google Colab
Create and deploy scheduled and event-driven automated machine learning pipelines
Introduction to MLFlow
Create and deploy machine learning APIs

Conclusion

Background knowledge

Python coding experience
Familiarity with pandas, numpy, and scikit-learn
Understanding of basic machine learning concepts, including supervised learning
Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use-cases

Semantic Search: search the entire web or a repository of documents and surface relevant search results (e.g., Google Search)
Natural Language Generation: auto-complete sentences as you write emails or draft documents (e.g., Gmail)
Machine Translation: convert text and audio from one language to another (e.g., Google Translate and Apple Translate)
Speech Recognition and Question Answering: give voice commands and control your home devices (e.g., Amazon Alexa, Apple Siri, Google Assistant, and Microsoft Cortana)
Customer Service Chatbots: ask account-related questions and get mostly reasonable answers (e.g., Intercom)

Have questions?

GET IN TOUCH