Live training with David Yerrington starts on February 4th at 1 PM (ET)
Training duration: 3 hours
Subscribe now and get 7-Day free trial
Sign-up for a Basic or Premium Plan and Get 10-35% Additional Discount Off Live Training
Instructor
Data Science Consultant | Yerrington Consulting
David Yerrington
Live training starts in:
-
00 Days
-
00 Hours
-
00 Minutes
-
00 Seconds
By the end of the course, participants will be able to:
-
Understand the types of problems solved with parallel computing
-
Identify the major components of Dask: Collection Types and Scheduler
-
Be familiar with types of parallel processing provided by Dask
-
Understand how graphs represent tasks with dependencies
-
Explain the difference between Pandas and Dask DataFrames
-
Know how to examine graph processes using the scheduler dashboard
Course Abstract
One major problem encountered in the data science world is scalability. Working on a single computer limits how much and how fast you can process data. Most real-world datasets are bigger than a single computer can process, so learning a parallel computing framework becomes increasingly necessary to be productive. In this session, you will learn how to work, hands-on, with the Dask framework to build scalable transformations to support analytic applications.
Course Schedule
Module 1: Intro to Parallel Computing
This module will briefly examine the concept of parallel computing and which ideas are most relevant to how Dask works.
- Understand the types of problems solved with parallel computing
- Identify the major components of Dask: Collection Types and its Scheduler
- Be familiar with types of parallel processing provided by Dask
Module 2: Intro to Dask
Coding of more specific, hands-on examples, using Jupyter notebooks. This module explores a few more cases that fundamentally illustrate the underlying datatypes provided by Dask while also overviewing their tradeoffs.
-Understand how graphs represent tasks with dependencies
- Examining tasks in real-time using the Dask dashboard
- Assess trade-offs between various Dask data types.
Module 3: Pandas + Desk
One of the most useful aspects of Dask is its DataFrame data type, which behaves similarly to the Pandas API. We will work together on a few examples of how Dask and Pandas are similar but how to use them together effectively.
-Be able to explain the difference between Pandas and Dask DataFrames
-Become familiar with storage options
-Understand nuances with schema
Who will be interested in this course?
This course is geared to data scientists, data engineers, machine learning engineers and software engineers of all levels who wish to gain a deep understanding of Parallel Computing with Dask and Pandas and how to apply it to real-world situations.
Which knowledge and skills you should have?
Strong understanding of Python and Pandas required.
Knowledge of Pandas aggregation and core data transformation methods.
Ability to configure a Python environment and install packages.
Familiarity with Jupyter Notebooks.
Have questions?
What is included in your ticket?
-
Access to live training and QA session with the Instructor
-
Access to the on-demand recording
-
Certificate of completion