Course Abstract

Training duration : 90 minutes

Building an efficient data pipeline to apply machine learning models in production has been a challenge for many data science practitioners and software engineers. While the model formats have been largely standardized, there is a great variety of data input sources that almost always require customized processing. On top of that is the streaming data inputs. A data pipeline architecture must be carefully implemented for reliable production deployment when data has to be consumed continuously. In TensorFlow 2.0, tf.data has been introduced as a canonical way of data processing for training and inference with tf.keras models. It simplifies data processing for static and streaming data sources, helping a lot for production deployment of machine learning models.

DIFFICULTY LEVEL: INTERMEDIATE

Learning Objectives

  • Understanding the essential knowledge of TensorFlow framework, including building models with Keras (tf.keras)

  • Understanding the data pipeline for machine learning with TensorFlow (tf.data)

  • Build machine learning data pipeline in production with different input sources

  • Utilizing machine learning with streaming data in production usage with TensorFlow and Apache Kafka

Instructor

Instructor Bio:

Director of Engineering

Yong Tang

Yong Tang, Ph.D., is Director of Engineering at MobileIron. He is a core contributor of many open source projects in cloud-native and machine learning areas. He is a maintainer and SIG I/O lead of the TensorFlow project and received the Open Source Peer Bonus award from Google for his contributions to TensorFlow. He is also a maintainer of Docker/Moby, the widely used open-source container platform, and a core maintainer of CoreDNS, a Cloud Native Computing Foundation (CNCF) graduated project for service discovery.

Course Outline

Module 1: Understand the essential knowledge of machine learning with TensorFlow.
- Building machine learning models with Keras (tf.keras).
- Understanding the data pipeline for machine learning with TensorFlow (tf.data).

Module 2: Build machine learning data pipeline in production with different input sources.
- Utilizing columnar Data and CSV Dataset with TensorFlow.
- Utilizing Database SQL Queries as input with TensorFlow.
- Building streaming data in production usage with TensorFlow and Apache Kafka.

Background knowledge

  • Some understanding of machine learning and some basic programming with python might be needed

  • A local python programming environment is not required (Google Colab will be used instead)

Use-cases this course could be useful for

  • Software developers

  • Data scientists

  • Machine learning engineers