When Privacy Meets AI - Your Kick-Start Guide to Machine Learning with Synthetic Data

Alexandra Ebert, Chief Trust Officer | Chair of the IEEE Synthetic Data IC Expert Group | AI, Privacy & GDPR Expert, MOSTLY AI | EEE Standards Association | #humanAIze

Over 80% of all AI projects fail – but a huge chunk of projects doesn’t even get started due to privacy constraints. Therefore Gartner predicts that by 2024 60% of all machine learning training data will be synthetic. High time to kick-start your synthetic data (SD) journey! Join Alexandra for a hands-on tutorial on synthetic data fundamentals to learn how to create synthetic data you can trust, assess its quality, and use it for privacy-preserving ML training. As a bonus, we’ll look into boosting your ML performance with smart upsampling.

Reasoning in Natural Language

Dan Roth, PhD, Professor | VP | Distinguished Scientist, University of Pennsylvania | Amazon AWS

The fundamental issue underlying natural language understanding is that of semantics – it involves grounding surface representations of language in meaning, and truth. The rapid progress made over the last few years in generating linguistically coherent natural language has blurred, in the mind of many people, the difference between natural language generation, understanding, and the ability to reason with respect to world. Nevertheless, robust support of high-level decisions that depend on natural language understanding, and one that requires dealing with 
“truthfulness” are still beyond our capabilities, partly since most of these tasks are very sparse, often require grounding, and may depend on new types of supervision signals. 

I will discuss some of the challenges underlying reasoning – making natural language understanding decisions that depend on multiple, interdependent, models, and present some of our work in this space, 
from supporting reasoning via decomposition, to applications to navigating information pollution.

Applying Responsible AI with Open-Source Tools

David Talby, PhD, CTO, John Snow Lab

While there's a lot of work done on defining the risks, goals, and policies for Responsible AI, less is known about what you can apply today to build safe, fair, and reliable models. This session introduces open-source tools and examples of using them in real-world projects - for addressing four common challenges.

The first is robustness - testing & improving a model's ability to handle accidental or intentional minor changes in input that can uncover model fragility and failure points. The second is the detection & fixing of labeling errors, which provide an upper limit to accuracy and exist in most widely used datasets. The third is bias - testing that a model performs equally across gender, age, race, ethnicity, or other critical groups. The fourth is data leakage, in particular in combination with leakage caused by using personally identifiable information in training data.

This session is intended for data science practitioners and leaders who need to know what they can & should do today to build AI systems that work safety & correctly in the real world.