Good, Fast, Cheap: How to do Data Science with Missing Data

PAST LIVE TRAINING: Available On-Demand: Good, Fast, Cheap: How to do Data Science with Missing Data

Live training with Matt Brems starts on September 21st at 12 PM (ET)

Training duration: 4 hrs (Hands-on)

Price with 10% discount

Regular Price: $210.00

$189.00

RESERVE YOUR SPOT

Subscribe now and start 7-day free trial

Sign-up for Premium Plan and Get 10-35% Additional Discount Live Training

VIEW PLANS

Course Outline

1. Introduction to Missing Data

2. Strategies for doing Data Science with Missing Data

Avoid Missing Data
Ignore Missing Data

3. Account for Missing Data

Unit missingness vs. item missingness.
Weight class adjustments for unit missingness
The three types of missing data: MCAR, MAR, NMAR
Imputation techniques (deductive, single, multiple)
Pattern submodel method

4. Putting it together in a workflow

5. Practical considerations and warnings

Instructor Bio:

Matt Brems

Global Lead Data Science Instructor | General Assembly

Matt Brems

Matt is currently Managing Partner and Principal Data Scientist at BetaVector. His full-time professional data work spans finance, education, consumer-packaged goods, and politics and he earned General Assembly's 2019 "Distinguished Faculty Member of the Year" award. Matt earned his Master's degree in statistics from Ohio State. Matt is passionate about responsibly putting the power of machine learning into the hands of as many people as possible and mentoring folx in data and tech careers. Matt also volunteers with Statistics Without Borders and currently serves on their Executive Committee as the Marketing & Communications Director.

DIFFICULTY LEVEL: BEGINNER - INTERMEDIATE

10% discount ends in:

00 Days
00 Hours
00 Minutes
00 Seconds

RESERVE YOUR SPOT

Course Abstract

If you've never heard of the good, fast, cheap, dilemma, it goes something like this: You can have something good and fast, but it won't be cheap. You can have something good and cheap, but it won't be fast. You can have something fast and cheap, but it won't be good. In short, you can pick two of the three but you can't have all three. If you've tackled a data science problem before, I can all but guarantee that you've run into missing data. How do we handle it? Well, we can avoid, ignore, or try to account for missing data. The problem is, none of these strategies are good, fast, *and* cheap. We'll start by visualizing missing data and identify the three different types of missing data, which will allow us to see how they affect whether we should avoid, ignore, or account for the missing data. We will walk through the advantages and disadvantages of each approach as well as how to visualize and implement approaches. We'll wrap up with practical tips for working with missing data and recommendations for integrating it with your workflow!

Learning Objectives

By the end of this course, you should be able to:

Describe the impact of missing data using simulations
Identify techniques for avoiding missing data and give specific examples of how to avoid missing data.
Define unit and item missingness, and identify when they occur
Implement weight class adjustments, and identify advantages and disadvantages of this technique
Define and give examples of data that are missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR)
Describe a workflow for doing data science with missing data
Describe proper regression imputation and the pattern submodel method
Select the best missing data technique given your situation and real-world constraints

Have questions?

GET IN TOUCH

What is included in your ticket?

Access to live training and QA session with the Instructor
Access to the on-demand recording
Certificate of completion

Upcoming Live Training & Recordings

Access all live training

ACCESS NOW

Who should attend?

Any data professional who has encountered missing data, or anyone who is interested in learning more about missing data. There is no required background knowledge, though an understanding of linear regression, standard deviation, and Python programming can be helpful.