Data Science with Missing Data

How to do Data Science with Missing Data

This course is available only as a part of subscription plans

Course Abstract

Training duration: 4 hours

If you've never heard of the "good, fast, cheap" dilemma, it goes something like this: You can have something good and fast, but it won't be cheap. You can have something good and cheap, but it won't be fast. You can have something fast and cheap, but it won't be good. In short, you can pick two of the three but you can't have all three. If you've tackled a data science problem before, we can all but guarantee that you've run into missing data. How do we handle it? Well, we can avoid, ignore, or try to account for missing data. The problem is, none of these strategies are good, fast, *and* cheap.

DIFFICULTY LEVEL: BEGINNER

Learning Objectives

How to visualize missing data and identify the three different types of missing data
How missing data affect whether we should avoid, ignore, or account for the missing data
Advantages and disadvantages of each approach
How to visualize and implement approaches
Practical tips for working with missing data
Recommendations for integrating it with your workflow!

Instructor

Instructor Bio:

Global Lead Data Science Instructor | General Assembly

Matt Brems

Matt is currently Managing Partner and Principal Data Scientist at BetaVector. His full-time professional data work spans finance, education, consumer-packaged goods, and politics and he earned General Assembly's 2019 "Distinguished Faculty Member of the Year" award. Matt earned his Master's degree in statistics from Ohio State. Matt is passionate about responsibly putting the power of machine learning into the hands of as many people as possible and mentoring folx in data and tech careers. Matt also volunteers with Statistics Without Borders and currently serves on their Executive Committee as the Marketing & Communications Director.

INTERESTED IN MORE HANDS-ON TRAINING SESSIONS?

VIEW PLANS >>

Course Outline

Module 1: An introduction to missing data

Module 2: Strategies for doing data science with missing data

- Avoid missing data

- Ignore missing data

- Account for missing data

- Unit missingness

- Item missingness

Module 3: Practical considerations and warnings

Have questions?

GET IN TOUCH >>

Background knowledge

This course is for current and aspiring Data Scientists, Data Analysts, Machine Learing Engineers and AI Product Managers
Knowledge of following tools and concepts is useful:
Familiarity with Python and Jupyter notebooks
Some knowledge of Pandas library is useful, but not required

Real-world application

Supervised Learning can be used in Customer churn modeling can help identify which of the customers of a business are likely to stop engaging with the business and why.
Dynamic pricing for marketing campaigns for any goods or services rely on pricing data. Airlines and ride-share services have successfully implemented dynamic price optimization strategies using supervised learning
Tackling missing data scenarios help rectify and enhance modeling capabilities in variety of business applications including streaming, finance, e-commerce.

CHECK OUT NEW AND FEATURED COURSES >>

SEE ALL COURSES>>