Course curriculum
Regardless of where you are in your data science career, you will eventually be confronted with datasets that cannot fit into memory of a single machine–and the problems that often come with this situation. In this talk, we will review key strategies that will help you adapt to your growing datasets. Importantly, we will consider when you might choose one strategy over another. We will discuss different approaches you can take to adapt your data so that it fits in your existing analysis framework. Then we will review the steps you can take when the analysis is simply too big to fit in the RAM of a single machine. We will examine how you might speed up calculations by using parallel processes and/or GPUs and by using frameworks such as Python’s Dask and the R future package. This discussion will equip you with strategies to tackle larger datasets. More data does not have to mean more problems !
-
1
What to Do When Your Data Gets Big
-
What to Do When Your Data Gets Big
-
Instructor
Senior Data Scientist Saturn Cloud
Nathan Ballou