Get Ahead with Expert-Led Training in Advanced Natural Language Processing

In this 3-part course series, we will provide a foundational understanding of one of the major branches of machine learning: natural language processing, also known as NLP. Historically, humans have used computers to crunch numbers but have relied on the human brain to analyze text and audio. 

Now, machines are able to process text and audio in ways that most humans would have considered magical just two decades ago. NLP is a part of our everyday lives in the form of Google Search, Gmail Smart Compose, Google Translate, Amazon Alexa, Apple Siri, Google Assistant, Microsoft Cortana, and other applications. These commercial successes are the reason why NLP has exploded in popularity over the last few years. 

We will cover the advances in NLP that have made these commercial successes possible. This course is an applied course, and we will use several modern mainstream NLP libraries to develop NLP applications; these libraries include spaCy, fast.ai, and Hugging Face. We will also use PyTorch, scikit-learn, pandas, numpy, and other common data science packages.

Using state-of-the-art NLP models such as BERT and GPT-3, we will solve NLP tasks such as named entity recognition and text classification. We will train NLP models with performance comparable or superior to out-of-the-box systems.

We will also cover modern NLP techniques such as transfer learning and fine-tuning and work with the transformer architecture, the most successful neural network architecture in NLP today. Along the way, we will cover some of the core components of the NLP pipeline such as tokenizers, word embeddings, and more. 

Finally, we will cover the most important aspect of applied NLP: productionizing the models using automated pipelines, APIs, and web apps.

You can complete the courses in sequence or complete individual courses based on your interest.

Instructor

Co-founder and Head of Data | Glean

Ankur Patel

Ankur Patel is the co-founder and Head of Data at Glean. Glean uses NLP to extract data from invoices and generate vendor spend intelligence for clients. Ankur is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Previously, Ankur led teams at 7Park Data, ThetaRay, and R-Squared Macro and began his career at Bridgewater Associates and J.P. Morgan. He is a graduate of Princeton University and currently resides in New York City.

Advanced NLP 1: Overview of Basic to State-of-the-Art NLP

Module 1: Introduction to NLP


  • What is NLP?
  • History of NLP
  • Motivation for NLP
  • Popular NLP applications today


Module 2: Basic NLP


  • Define basic NLP tasks
  • Introduce modern open-source NLP software libraries: spaCy, fast.ai, and Hugging Face
  • Perform basic NLP tasks using spaCy: tokenization, part-of-speech tagging, dependency parsing, chunking, lemmatization, and stemming


Module 3: State-of-the-Art (SOTA) NLP

  • Attention Mechanisms and Transformers
  • Pretrained Language Models, Transfer Learning, and Fine-tuning
  • Application: IMDb Movie Review Sentiment Analysis using fast.ai


Conclusion

Advanced NLP 2: Modern NLP in Depth, from Theory to Action

Module 1: Modern NLP in theory

  • The path to NLP’s watershed “ImageNet” moment in 2018
  • Word embeddings: one-hot encoding, word2vec, GloVe, fastText, and context-aware pretrained word embeddings
  • Sequential models: vanilla recurrent neural networks (RNNs), long short-term memory (LSTMs), and gated recurrent units (GRUs)
  • Attention mechanisms and Transformers
  • ULMFiT, ELMo, BERT, BERTology, GPT-1, GPT-2, and GPT-3


Module 2: Modern NLP in action

  • Refresher on pre-trained language models, transfer learning, and fine-tuning
  • Introduction to common NLP tasks via Hugging Face: sequence classification, question answering, language modeling, text generation, named entity recognition, summarization, and translation


Module 3: Modern NLP applications

  • Explore dataset: AG news dataset
  • Application #1: Named Entity Recognition (NER)
  • Perform inference using out-of-the-box spaCy NER model
  • Annotate data using Prodigy
  • Develop custom named entity recognition model using spaCy
  • Compare custom NER model against the out-of-the-box spaCy NER model
  • Application #2: Text Classification
  • Annotate data using Prodigy
  • Develop text classification model using spaCy


Conclusion

Advanced NLP 3: NLP in Production via Web Apps, Automated Pipelines, and APIs

Module 1: Tools of the Trade

  • Deep Learning Frameworks: PyTorch vs. TensorFlow
  • Visualization and Experiment Tracking
  • AutoML and Data Platforms
  • ML Infrastructure and Compute
  • Edge / On-Device Inference
  • Cloud Inference and Machine Learning as a Service (MLaaS)
  • Continuous Integration and Delivery (CI/CD)


Module 2: NLP via Web Apps

  • Introduction to web apps and Streamlit
  • Build NLP web apps using Streamlit
  • Deploy and explore NLP web apps


Module 3: NLP at Scale via Automated Pipelines and APIs

  • The data team and the ML lifecyle: data scientists, engineers, and analysts; prototyping, deploying, and maintenance; notebooks and scripts
  • Introduction to Spark and Databricks
  • Compare speed of NER inference using Databricks vs. Google Colab
  • Create and deploy scheduled and event-driven automated machine learning pipelines
  • Introduction to MLFlow
  • Create and deploy machine learning APIs


Conclusion

Background knowledge

  • Python coding experience

  • Familiarity with pandas, numpy, and scikit-learn

  • Understanding of basic machine learning concepts, including supervised learning

  • Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use-cases

  • Semantic Search: search the entire web or a repository of documents and surface relevant search results (e.g., Google Search)

  • Natural Language Generation: auto-complete sentences as you write emails or draft documents (e.g., Gmail)

  • Machine Translation: convert text and audio from one language to another (e.g., Google Translate and Apple Translate)

  • Speech Recognition and Question Answering: give voice commands and control your home devices (e.g., Amazon Alexa, Apple Siri, Google Assistant, and Microsoft Cortana)

  • Customer Service Chatbots: ask account-related questions and get mostly reasonable answers (e.g., Intercom)