Course curriculum
The long-term success of machine learning relies on consistently labeled high-quality data. While most machine learning initiatives begin in the lab, they take on a life of their own and can create significant challenges once they scale. ML data ops practitioners can find themselves being consumed by the logistics of data annotation and management instead of focusing on the science. Wherever you are in your team’s machine learning journey, you must think about evolving towards large-scale production. Proactively planning a data management strategy can generate progressively better results, but it requires thought and stakeholder buy-in. A key ingredient of this journey is your data labeling and annotation framework. A data pipeline designed for human judgment and incremental training on edge cases provides that last mile of acceptability, enabling the machine learning solution to go to production. This session will reveal the implications of a live data loop in a production environment and how it significantly impacts the customer experience. Attendees will also takeaway trends and challenges in combining humans with the machine learning pipeline. In this session, iMerit's Jai Natarajan reveals best practices to build scalable and repeatable data labeling pipelines with a balance of tools and humans-in-the-loop. Through peer, manager, and machine-learning expert collaboration, data annotators refine their skills and master tasks well beyond the expertise of crowdsourcing. In a collaborative framework, annotators and ML experts negotiate and create meaning through an iterative feedback process as they identify new concepts and nuances in the data. Attendees will learn concepts like designing to break the ML, edge case knowledge management and workflow management.
-
1
Best Practices for Data Annotation at Scale
-
Best Practices for Data Annotation at Scale
-
Instructor
Vice President, Strategic Business Development , iMerit
Jai Natarajan