ODSC West 2020: GPU-accelerated Data Science with RAPIDS
This course is available only as a part of subscription plans.
The PyData ecosystem has grown to millions of data science users, who appreciate its ease of use, consistent syntax, and breadth of features. Traditionally, PyData frameworks were only executable on CPUs, making it difficult for users to take advantage of the increasingly-powerful GPUs that have already revolutionized deep learning and related fields. In this session, we'll introduce RAPIDS, an open source framework that brings transparent GPU backends to popular Python APIs, such as those from Pandas, scikit-learn, and NetworkX. We'll show how you can port a huge range of existing workloads to GPU in a matter of minutes and get speedups on the order of 40x or more for common workloads.
The session will emphasize both data preparation (ETL) and machine learning operations, with a hands-on demonstration of porting a typical workflow from CPU to GPU and measuring the speedup. We’ll go into more detail on real-world applications taking advantage of these speed improvements, including hyperparameter optimization for machine learning models, single cell genomics analysis, and applications in finance. For large-data users, we’ll discuss some of the options for scaling RAPIDS to multiple GPUs or multiple nodes, emphasizing the tight integration with the Dask ecosystem.
Workshop Overview and Author Bio
Before you get started: Prerequisites and Resources
GPU-accelerated Data Science with RAPIDS
John Zedlewski
Corey Nolet