Course Abstract

Training duration: 90 min (Hands-on)

In this 90-minute course, Ankur Patel will discuss how to deploy NLP models to production. We will start with a broad overview of the decisions you will have to make as you determine your ML software stack. This includes a comparison of PyTorch vs. TensorFlow and a walkthrough of AWS vs. GCP vs. Azure. Then, we move towards productionization of our NLP models using web apps. We will build several NLP web apps using an open source library called Streamlit and deploy and explore the apps via a web browser. Finally, we will perform NLP at scale using automated pipelines and APIs, leveraging Spark clusters on Databricks. We will also review the entire ML lifecycle from prototyping to deploying to model maintenance. By the end of this course, you should have a good grasp of modern NLP, including how to develop and deploy models to production.

DIFFICULTY LEVEL: INTERMEDIATE

Instructor Bio:

Ankur Patel

Co-founder and Head of Data | Glean

Ankur Patel

Ankur Patel is the co-founder and Head of Data at Glean. Glean uses NLP to extract data from invoices and generate vendor spend intelligence for clients. Ankur is an applied machine learning specialist in both unsupervised learning and natural language processing, and he is the author of Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data and Applied Natural Language Processing in the Enterprise: Teaching Machines to Read, Write, and Understand. Previously, Ankur led teams at 7Park Data, ThetaRay, and R-Squared Macro and began his career at Bridgewater Associates and J.P. Morgan. He is a graduate of Princeton University and currently resides in New York City.

Course Outline

Module 1: Tools of the Trade

  • Deep Learning Frameworks: PyTorch vs. TensorFlow
  • Visualization and Experiment Tracking
  • AutoML and Data Platforms
  • ML Infrastructure and Compute
  • Edge / On-Device Inference
  • Cloud Inference and Machine Learning as a Service (MLaaS)
  • Continuous Integration and Delivery (CI/CD)


Module 2: NLP via Web Apps

  • Introduction to web apps and Streamlit
  • Build NLP web apps using Streamlit
  • Deploy and explore NLP web apps


Module 3: NLP at Scale via Automated Pipelines and APIs

  • The data team and the ML lifecycle: data scientists, engineers, and analysts; prototyping, deploying, and maintenance; notebooks and scripts
  • Introduction to Spark and Databricks
  • Set up and explore Databricks
  • Create and deploy scheduled automated machine learning pipelines
  • Introduction to MLFlow
  • Create and deploy machine learning APIs


Conclusion

Background knowledge

  • Python coding experience

  • Familiarity with pandas, numpy, and scikit-learn

  • Understanding of basic machine learning concepts, including supervised learning

  • Experience with deep learning and frameworks such as TensorFlow or PyTorch is a plus

Applicable Use Cases

  • Semantic Search: search the entire web or a repository of documents and surface relevant search results (e.g., Google Search)

  • Natural Language Generation: auto-complete sentences as you write emails or draft documents (e.g., Gmail)

  • Machine Translation: convert text and audio from one language to another (e.g., Google Translate and Apple Translate)

  • Speech Recognition and Question Answering: give voice commands and control your home devices (e.g., Amazon Alexa, Apple Siri, Google Assistant, and Microsoft Cortana)

  • Customer Service Chatbots: ask account-related questions and get mostly reasonable answers (e.g., Intercom)