Live training with Matt Brems starts on June 15th at 1 PM (ET)

Training duration: 4 hours (Hands-on)

Price with 10% discount

Regular Price: $210.00

Subscribe now and start 7-day free trial

Sign-up for Premium Plan and Get 10-35% Additional Discount Live Training

Instructor Bio:

Instructor Name

Global Lead Data Science Instructor | General Assembly

Matt Brems

Matt is currently Managing Partner and Principal Data Scientist at BetaVector. His full-time professional data work spans finance, education, consumer-packaged goods, and politics and he earned General Assembly's 2019 "Distinguished Faculty Member of the Year" award. Matt earned his Master's degree in statistics from Ohio State. Matt is passionate about responsibly putting the power of machine learning into the hands of as many people as possible and mentoring folx in data and tech careers. Matt also volunteers with Statistics Without Borders and currently serves on their Executive Committee as the Marketing & Communications Director.

10% discount ends in:

  • 00 Days
  • 00 Hours
  • 00 Minutes
  • 00 Seconds

Learning Objectives

  • Clean text data with regular expressions and tokenization

  • Learn lemmatizing and stemming, including how and when to use these techniques

  • Transform data with CountVectorizer and TFIDFVectorizer

  • Fit machine learning models in scikit-learn and evaluate their performance

  • Build pipelines and GridSearch over NLP hyperparameters

DIFFICULTY LEVEL: BEGINNER

Course Abstract

How many times a day do you use search engines or autocorrect? Do you translate text from one language to another? Getting computers to understand language like humans understand language is the key to solving many problems. However, there are so many things to learn! This course is the perfect place to start. We'll start by defining natural language processing (NLP) and exploring its uses. We'll see how NLP is biased and how to proactively reduce bias. We'll understand the process of tackling NLP problems, including cleaning text data and converting it so that we build models with text data. We'll cover vectorizers, hyperparameters, and pipelines. You'll come away with a full understanding of how to tackle an NLP problem. All of this will be done in Python! You'll know how to do these things in Python, because we'll do them together. If you don't have a strong Python background right now or if you don't know much about machine learning yet, that's OK! We'll assume no prior knowledge and get you set up. This is perfect for beginners, those who want to learn how to do these things in Python, and/or those who want to refresh their skills.

Course Outline

Module 1: Introduction to Natural Language Processing (NLP)

- What is natural language processing?

- What are the applications of NLP?

- What is bias in NLP?


Module 2: Cleaning Text Data

- What is tokenizing, and how do we do it?

- What are regular expressions (RegEx), and how can they be used?

- What is lemmatizing and stemming?


Module 3: Converting Text Data to Model Features

- What is vectorizing?

- How do we properly construct training and testing sets when working with NLP vectors?

- What is CountVectorizer and when should it be used?

- What is TFIDFVectorizer and when should it be used?


Module 4: Hyperparameters in NLP

- What are hyperparameters? What are NLP hyperparameters?

- What are stop words and how do they affect our model?

- What are n-grams and how do they affect our model?

- What are max_features, max_df, and min_df, and how do they affect our model?


Module 5: Machine Learning with Pipelines in NLP

- What are considerations of fitting machine learning models in NLP?

- What are pipelines and GridSearch?

- How do we automate model selection with pipelines?

Which knowledge and skills you should have?

  • We assume no background in NLP

  • All code is written in Python, so experience is helpful. However, solutions are provided so a Python background is not required

  • Some experience with machine learning is helpful, but not necessary

What is included in your ticket?

  • Access to live training and QA session with the Instructor

  • Access to the on-demand recording

  • Certificate of completion

Upcoming Live Training & Recordings

Access all live training