Live training with David Yerrington starts on February 4th at 1 PM (ET)

Training duration: 3 hours

Price with 10% discount

Regular Price: $210.00

Subscribe now and get 7-Day free trial

Sign-up for a Basic or Premium Plan and Get 10-35% Additional Discount Off Live Training

Instructor

Data Science Consultant | Yerrington Consulting

David Yerrington

At the age of 8, David began learning the BASIC programming language while living in Alaska's outskirts. He studied music performance but found the beginning of his career building a small software and consulting company in the late '90s. David's career spans almost 20 years including several startups as a lead engineer building scalable data services from prototype to production. During his time at Sony/Gracenote, he lead the implementation of prototypes featured in the Consumer Electronics Show, spanning problems with recommendation, content classification, and profiling type projects. David also held roles as a data scientist at a YC backed dating app company and an analytics startup researching and building scalable recommendation pipelines. While working at General Assembly as a Lead Global Data Science Instructor, David helped architect the first significant versions of their data science immersive curriculum. Also, he piloted many of the hybrid Data Science Immersive programs still taught today. Currently, David consults and contracts full-time for various clients and projects ranging from NLP, recommendation, big data, and professional training for large and small teams. David enjoys playing the cello in orchestras and a small group that performs classic video game covers when not working.

Live training starts in:

  • 00 Days
  • 00 Hours
  • 00 Minutes
  • 00 Seconds

By the end of the course, participants will be able to:

  • Understand the types of problems solved with parallel computing

  • Identify the major components of Dask: Collection Types and Scheduler

  • Be familiar with types of parallel processing provided by Dask

  • Understand how graphs represent tasks with dependencies

  • Explain the difference between Pandas and Dask DataFrames

  • Know how to examine graph processes using the scheduler dashboard

Course Abstract

One major problem encountered in the data science world is scalability.  Working on a single computer limits how much and how fast you can process data.  Most real-world datasets are bigger than a single computer can process, so learning a parallel computing framework becomes increasingly necessary to be productive.  In this session, you will learn how to work, hands-on, with the Dask framework to build scalable transformations to support analytic applications.

Course Schedule

Module 1: Intro to Parallel Computing


This module will briefly examine the concept of parallel computing and which ideas are most relevant to how Dask works.

- Understand the types of problems solved with parallel computing

- Identify the major components of Dask: Collection Types and its Scheduler

- Be familiar with types of parallel processing provided by Dask

Module 2: Intro to Dask
Coding of more specific, hands-on examples, using Jupyter notebooks. This module explores a few more cases that fundamentally illustrate the underlying datatypes provided by Dask while also overviewing their tradeoffs.

-Understand how graphs represent tasks with dependencies

- Examining tasks in real-time using the Dask dashboard

- Assess trade-offs between various Dask data types.

Module 3: Pandas + Desk
One of the most useful aspects of Dask is its DataFrame data type, which behaves similarly to the Pandas API. We will work together on a few examples of how Dask and Pandas are similar but how to use them together effectively.

-Be able to explain the difference between Pandas and Dask DataFrames

-Become familiar with storage options

-Understand nuances with schema

Who will be interested in this course?

This course is geared to data scientists, data engineers, machine learning engineers and software engineers of all levels who wish to gain a deep understanding of Parallel Computing with Dask and Pandas and how to apply it to real-world situations.

Which knowledge and skills you should have?

Strong understanding of Python and Pandas required.  

Knowledge of Pandas aggregation and core data transformation methods.  

Ability to configure a Python environment and install packages.

Familiarity with Jupyter Notebooks.

What is included in your ticket?

  • Access to live training and QA session with the Instructor

  • Access to the on-demand recording

  • Certificate of completion

Access all live training

Upcoming Live Training & Recordings