Course curriculum

  • 1

    Prepare Data Science/ML Pipelines with Ease, Speed Following Best Practices

    • Abstract and Bio

    • Prepare Data Science/ML Pipelines with Ease, Speed Following Best Practices By Ido Michael

  • 2

    Data Science in the Cloud-Native Era

    • Abstract and Bio

    • Data Science in the Cloud-Native Era

  • 3

    Drift Detection in Structured and Unstructured Data

    • Abstract and Bio

    • Drift Detection in Structured and Unstructured Data by Keegan Hines

  • 4

    Simplifying MLOps by Taking Storage Worries out of the Equation

    • Abstract and Bio

    • Simplifying MLOps by Taking Storage Worries out of the Equation

Abstracts and Speaker

Prepare Data Science/ML Pipelines with Ease, Speed Following Best Practices

Existing ETL & MLOps tools claim to solve orchestration problems but no one does it the right way. In this hands-on workshop, we’ll go through a sample standard ML data pipeline, which represents the typical data science use case, extracting data from multiple data sources: DB and DWH, transforming it, viewing the data, and cleaning it. Then we’ll make sure it meets the quality standards and start training the model. During each of these phases, we will talk about testing (unit/integration tests). As a pre-pipeline step, we’ll talk about optional data preparation flows and talk about some strategies to accelerate the whole process by setting the quality gates, data testing, and some of the labeling services out there.


  Ido Michael, CTO @ Ploomber

Data Science in the Cloud-Native Era

In recent years, advances in data science have made tremendous progress yet designing large-scale data science and machine learning applications still remain challenging. The variety of machine learning frameworks, hardware accelerators, cloud-vendors as well as the complexity of data science workflows brings new challenges to MLOps. It’s non-trivial for data scientists to easily launch, manage, monitor, and optimize their pipelines in a scalable way. On the other hand, Kubernetes and containerization have revolutionized cloud applications in a manner not seen since Linux and virtualization's disruption of the server market. In this talk, we’ll provide an overview of the existing tools available and best practices to do MLOps effectively in the cloud-native era.


  Yuan Tang, Founding Engineer | Co-chair @ Akuity | Kubeflow

Drift Detection in Structured and Unstructured Data

Machine learning systems in production are subject to performance degradations due to many external factors and it is vital to actively monitor system stability and integrity. A common source of model degradation is due the inherent non-stationarity of the real world environment, commonly referred to as data drift. In this presentation, I will describe how to reliably quantify data drift in a variety of different data paradigms including Tabular data, Computer Vision data, and NLP data. Attendees of this talk will come away with a conceptual toolkit for thinking about data stability monitoring in their own models, with example use cases in common settings as well as in more challenging regimes.


  Keegan Hines, PhD, VP of ML | Adjunct Professor | Chair ArthurAI | Georgetown | CAMLIS

Simplifying MLOps by Taking Storage Worries out of the Equation

When it comes to MLOps, storage and data are related — but far from the same. So why is a storage company doing yet another MLOps talk at GTC this year? We're here to help you focus on data and not think about storage. We're going to do this in two ways: First, we'll show you how storage can get out of the way of data science. Second, we'll show you how a modern data experience works to streamline machine learning and inference operations. It's important to think about how to scale simplicity and performance across all the various components of an AI infrastructures. The right solution gets the storage out of the way of the data science, and allows data scientists to focus on the DATA. It also simplifies life for the Infrastructure team, enabling simple, trouble-free operation and automation that moves toward Infrastructure-as-Code.


  Miroslav Klivansky, Field Solution Evangelist – AI and Analytics @ Pure Storage