Hands on Parallel Computing with Dask and Pandas
The full course is available only as a part of subscription plans.
Training duration: 1 hour 30 min (Hands-on)
Understand the types of problems solved with parallel computing
Identify the major components of Dask: Collection Types and Scheduler
Be familiar with types of parallel processing provided by Dask
Understand how graphs represent tasks with dependencies
Be able to explain the difference between Pandas and Dask DataFrames
How to examine graph processes using the scheduler dashboard
Instructor Bio:
David Yerrington
Module 1: Intro to Parallel Computing and Dask
In this module, we will examine the concept of parallel computing briefly and how Dask works, hands-on, using Juptyer notebooks.
- Understand the types of problems solved with parallel computing
- Identify the major components of Dask: Collection Types and Scheduler
- Be familiar with types of parallel processing provided by Dask
- Understand how graphs represent tasks with dependencies
Module 2: Pandas vs Dask
One of the most useful aspect of Dask is it's DataFrame data type which is modeled after the Pandas API. We will work together on a few examples of how Dask and Pandas are similar but also how to use them together effectively.
- Be able to explain the difference between Pandas and Dask DataFrames
- How to examine graph processes using the scheduler dashboard
Strong understanding of Python and Pandas required
Knowledge of Pandas aggregation and core data transformation methods
Ability to configure a Python environment and install packages
Familiarity with Jupyter Notebooks