In this 3-part course series, we will provide a foundational understanding of one of the major branches of machine learning: natural language processing, also known as NLP. Historically, humans have used computers to crunch numbers but have relied on the human brain to analyze text and audio.
Now, machines are able to process text and audio in ways that most humans would have considered magical just two decades ago. NLP is a part of our everyday lives in the form of Google Search, Gmail Smart Compose, Google Translate, Amazon Alexa, Apple Siri, Google Assistant, Microsoft Cortana, and other applications. These commercial successes are the reason why NLP has exploded in popularity over the last few years.
We will cover the advances in NLP that have made these commercial successes possible. This course is an applied course, and we will use several modern mainstream NLP libraries to develop NLP applications; these libraries include spaCy, fast.ai, and Hugging Face. We will also use PyTorch, scikit-learn, pandas, numpy, and other common data science packages.
Using state-of-the-art NLP models such as BERT and GPT-3, we will solve NLP tasks such as named entity recognition and text classification. We will train NLP models with performance comparable or superior to out-of-the-box systems.
We will also cover modern NLP techniques such as transfer learning and fine-tuning and work with the transformer architecture, the most successful neural network architecture in NLP today. Along the way, we will cover some of the core components of the NLP pipeline such as tokenizers, word embeddings, and more.
Finally, we will cover the most important aspect of applied NLP: productionizing the models using automated pipelines, APIs, and web apps.
You can complete the courses in sequence or complete individual courses based on your interest.
Module 1: Introduction to NLP
- What is NLP?
- History of NLP
- Motivation for NLP
- Popular NLP applications today
Module 2: Basic NLP
- Define basic NLP tasks
- Introduce modern open-source NLP software libraries: spaCy, fast.ai, and Hugging Face
- Perform basic NLP tasks using spaCy: tokenization, part-of-speech tagging, dependency parsing, chunking, lemmatization, and stemming
Module 3: State-of-the-Art (SOTA) NLP
- Attention Mechanisms and Transformers
- Pretrained Language Models, Transfer Learning, and Fine-tuning
- Application: IMDb Movie Review Sentiment Analysis using fast.ai
Module 1: Modern NLP in theory
- The path to NLP’s watershed “ImageNet” moment in 2018
- Word embeddings: one-hot encoding, word2vec, GloVe, fastText, and context-aware pretrained word embeddings
- Sequential models: vanilla recurrent neural networks (RNNs), long short-term memory (LSTMs), and gated recurrent units (GRUs)
- Attention mechanisms and Transformers
- ULMFiT, ELMo, BERT, BERTology, GPT-1, GPT-2, and GPT-3
Module 2: Modern NLP in action
- Refresher on pre-trained language models, transfer learning, and fine-tuning
- Introduction to common NLP tasks via Hugging Face: sequence classification, question answering, language modeling, text generation, named entity recognition, summarization, and translation
Module 3: Modern NLP applications
- Explore dataset: AG news dataset
- Application #1: Named Entity Recognition (NER)
- Perform inference using out-of-the-box spaCy NER model
- Annotate data using Prodigy
- Develop custom named entity recognition model using spaCy
- Compare custom NER model against the out-of-the-box spaCy NER model
- Application #2: Text Classification
- Annotate data using Prodigy
- Develop text classification model using spaCy
Module 1: Tools of the Trade
- Deep Learning Frameworks: PyTorch vs. TensorFlow
- Visualization and Experiment Tracking
- AutoML and Data Platforms
- ML Infrastructure and Compute
- Edge / On-Device Inference
- Cloud Inference and Machine Learning as a Service (MLaaS)
- Continuous Integration and Delivery (CI/CD)
Module 2: NLP via Web Apps
- Introduction to web apps and Streamlit
- Build NLP web apps using Streamlit
- Deploy and explore NLP web apps
Module 3: NLP at Scale via Automated Pipelines and APIs
- The data team and the ML lifecyle: data scientists, engineers, and analysts; prototyping, deploying, and maintenance; notebooks and scripts
- Introduction to Spark and Databricks
- Compare speed of NER inference using Databricks vs. Google Colab
- Create and deploy scheduled and event-driven automated machine learning pipelines
- Introduction to MLFlow
- Create and deploy machine learning APIs
Python coding experience
Familiarity with pandas, numpy, and scikit-learn
Understanding of basic machine learning concepts, including supervised learning
Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus