How to do Data Science with Missing Data
This course is available only as a part of subscription plans
Training duration: 4 hours
If you've never heard of the "good, fast, cheap" dilemma, it goes something like this: You can have something good and fast, but it won't be cheap. You can have something good and cheap, but it won't be fast. You can have something fast and cheap, but it won't be good. In short, you can pick two of the three but you can't have all three. If you've tackled a data science problem before, we can all but guarantee that you've run into missing data. How do we handle it? Well, we can avoid, ignore, or try to account for missing data. The problem is, none of these strategies are good, fast, *and* cheap.
How to visualize missing data and identify the three different types of missing data
How missing data affect whether we should avoid, ignore, or account for the missing data
Advantages and disadvantages of each approach
How to visualize and implement approaches
Practical tips for working with missing data
Recommendations for integrating it with your workflow!
Instructor Bio:
Matt Brems
Module 1: An introduction to missing data
Module 2: Strategies for doing data science with missing data
- Avoid missing data
- Ignore missing data
- Account for missing data
- Unit missingness
- Item missingness
Module 3: Practical considerations and warnings
This course is for current and aspiring Data Scientists, Data Analysts, Machine Learing Engineers and AI Product Managers
Knowledge of following tools and concepts is useful:
Familiarity with Python and Jupyter notebooks
Some knowledge of Pandas library is useful, but not required