Course Abstract

This course will go into: - The challenges associated with creating ML models that are deployable; - Software engineering principles that should be applied to ML code to make it easier to deploy in the production environment; - And, how you can use an open-source Python library, called [Kedro](https://github.com/quantumblacklabs/kedro), to enhance your exploratory data analysis workflow as well as their transition to production-ready code. Kedro is an open-source development workflow framework that implements software engineering best-practice for data pipelines with an eye towards productionising machine learning models.

DIFFICULTY LEVEL: ADVANCED

Learning Objectives

  • Learn about the emergence of MLOps and production-level data and ML pipelines

  • Understanding of Kedro framework and basic functionalities

  • How to build a data pipeline with a demo on Kedro

Instructor

Instructor Bio:

Software Engineer | QuantumBlack

Kiyohito Kunii

Kiyo is a software engineer at QuantumBlack, an advanced analytics firm operating at the intersection of strategy, technology and design to improve performance outcomes for organizations. Kiyo is one of the core contributors and maintainers of Kedro, a Python library that implements software engineering best-practice for data and ML pipelines. Kiyo holds MSc in Computing Science from Imperial College London, and MA in Economics from The University of Edinburgh.

Course Abstract

Module 1: The emergence of MLOps and production-level data and ML pipelines

 - Learn about the trends driving interest in production-level code data science code 

- Get exposure to software principles data engineers and data scientists should consider applying to their code to make it easier to deploy into the production environment 

- You will need a basic understanding of data science, this module is geared to beginners 

Module2: Overview of Kedro

- Learn what Kedro is by going through basic functionalities like the project template, configuration, data catalog and pipeline 

- I'll show how it fits into the workflow for creating robust and reproducible data pipelines 

Module 3: Short demo of building a data pipeline with Kedro

- A short demo for how to create a new Kedro project, build and visualize a data pipeline using an example dataset.

Background knowledge

  • This course is for current or aspiring Data Scientists, Machine Learning and MLOps Engineers, AI Product Managers

  • Knowledge of following tools and concepts is useful:

  • Basic knowledge of Python and some familiarity of Python data science libraries (e.g. Pandas, Jupyter notebook) is recommended.

  • The course is aimed at data scientists and data engineers who are interested in building a production-ready data pipelines.