Course Abstract

Most of the recent advances in the deep learning field come at a high price. The costs involved in developing and training these models are two-fold: namely, they can be attributed to computing power and training data. Computational resources are getting increasingly more affordable through the wide spread of cloud computing services. On the other hand, gathering and especially manually labeling data cannot not scale in the same way. A common scenario is that in which unlabeled data comes cheap, but the labeling budget is severely limited. Practice shows that all data is not created equal: the choice of which data is prioritised to be labeled has a profound effect on the final performance of the resulting model. The task of determining which data samples would be most "informative" when labeled, goes under what is known as active learning.

DIFFICULTY LEVEL: ADVANCED

Learning Objectives

  • Theoretical overview of the ideas behind active learning

  • Applying active learning to an image classification problem.

  • How to implement active learning using PyTorch

Instructor

Instructor Bio:

Machine Learning DevOps Engineer | Scaleway

Olga Petrova, PhD

Olga is a deep learning R&D engineer at Scaleway, the second largest french cloud provider. She received her PhD in theoretical physics from Johns Hopkins University in 2013, followed by postdoctorate appointments at the Max Planck Institute in Dresden and the École Normale Supérieure in Paris. In the latter, she looked into the possible applications of artificial intelligence to quantum systems, among other things. Olga’s current interests focus on semi-supervised and active machine learning. On the community side, she enjoys blogging about the latest advancements in AI both in and out of working hours. Some of her writing can be seen on medium.com/@olgapetrova_92798

Background knowledge

  • This course is for current and aspiring Data Scientists, Deep Learning Engineers, AI Product Managers and Application Developers

  • Knowledge of following tools and concepts is useful:

  • Standard supervised machine learning concepts

  • Good understaing of PyTorch deep learning framework is recommended

  • torch, torchvision

Real-world applications

  • Active Learning techniques have been at the forefront of self-driving technolgies used by companies like Toyota, Voyage, Lyft.

  • Robotics and automation companies like OpenAI, Skydio, and even General Motors employ data annotation at scale in data science applications.

  • Data labeling and active learning is well practiced in platforms like Pinterest and Airbnb to recognize images, translate languages and generate realistic text,.