Course curriculum

Regardless of where you are in your data science career, you will eventually be confronted with datasets that cannot fit into memory of a single machine–and the problems that often come with this situation. In this talk, we will review key strategies that will help you adapt to your growing datasets. Importantly, we will consider when you might choose one strategy over another. We will discuss different approaches you can take to adapt your data so that it fits in your existing analysis framework. Then we will review the steps you can take when the analysis is simply too big to fit in the RAM of a single machine. We will examine how you might speed up calculations by using parallel processes and/or GPUs and by using frameworks such as Python’s Dask and the R future package. This discussion will equip you with strategies to tackle larger datasets. More data does not have to mean more problems !

  • 1

    What to Do When Your Data Gets Big

    • What to Do When Your Data Gets Big

Instructor

Senior Data Scientist Saturn Cloud

Nathan Ballou

Nathan Ballou is a Senior Data Scientist at Saturn Cloud, a cloud workspace for the whole data science team. Prior to working at Saturn Cloud, Nathan worked as a data science consultant and as an operations research analyst. When Nathan’s not evangelising machine learning at Saturn Cloud he can be found rowing on the Patapsco River in Baltimore.

ODSC APAC Virtual

Don't miss a chance to be among FIRST 100 to register for APAC 2022