5 Essential Machine Learning Safety Topics For Better AI

ENROLL FOR FREE

ODSC Conference Courses

Learn from the best ODSC conference speakers. Create your free account and start your learning journey!

As organizations increasingly rely on machine learning models for both developing strategic advantages and in their consumer-facing products. As a result, protecting one’s data and models has also become increasingly important. The videos below will show you how you can implement better machine learning safety practices and use better tools so you can keep your data and your organization safe from threats.

1. Is Your ML Secure? Cybersecurity and Threats in the ML World

Just like any other piece of software, machine learning models are vulnerable to attacks from malicious agents. However, data scientists and ML engineers rarely think about the security of their models.

Models are vulnerable too—they’re representations of underlying training datasets and are susceptible to attacks that can compromise the privacy and confidentiality of data.

Every single step in the machine learning lifecycle is susceptible to various security threats. But there are steps you can take. Watch this session to learn more about the most common types of attacks targeting the integrity, availability, and confidentiality of machine learning models, as well as the best practices for data scientists and ML engineers to mitigate security risks.

Speakers: Dr. Hari Bhaskar, Ph.D., Director - Data Science & AI Platform and Jean-Rene Gauthier, Ph.D., AI Platform Architect, Oracle

2. Analyzing Sensitive Data Using Differential Privacy

As we discover more AI and machine learning applications in fields like healthcare, biotechnology, and pharma, it’s becoming increasingly important that we find and utilize effective means of anonymizing data. One such strategy is to use differential privacy, which makes it impossible to determine if any single datapoint came from a specific dataset. This session will introduce you to differential privacy and illustrate how it can be used to gain insights from particularly sensitive data.

Speakers: Ashwin Machanavajjhala, Ph.D., Associate Professor and Co-Founder, Duke University | Tumult Labs and
Michael Hay, Ph.D., Associate Professor | Founder & CTO, Colgate University | Tumult Labs

3. Open-source Tools for Synthetic Data On-Demand

Consider the challenges you could tackle if you could remove the three most common bottlenecks to modern data workflows – limited, low-quality, and unsafe data. Advanced synthetics enable you to generate high-fidelity, artificial data on-demand from limited samples, as well as turn existing sensitive datasets into secure, shareable resources that are provably private by design.

This workshop will walk you through several real-world use cases for synthetic data. You’ll learn how to balance a biased medical dataset to improve early cancer detection in women, generate realistic time-series financial data for forecasting, and more. You can test the examples yourself – some with Gretel-synthetics, a fully open-source package, and some using Gretel Blueprints, a collection of notebooks and sample code that leverage the open-source package through Gretel’s client.

Speaker: Lipika Ramaswamy, Senior Applied Scientist, Gretel.ai

4. ImageNet and its Discontents. The Case for Responsible Interpretation in ML

Sociotechnical systems abound in examples of the ways they constitute sources of harm for historically marginalized groups. In this context, the field of machine learning has seen a rapid proliferation of new machine learning methods, model architectures, and optimization techniques. Yet, data -- which remains the backbone of machine learning research and development -- has received comparatively little research attention.

The speaker’s research hypothesis is that focusing exclusively on the content of training datasets — the data used for algorithms to “learn” associations — only captures part of the problem. Instead, we should identify the historical and conceptual conditions which unveil the modes of dataset construction. This session will provide an analysis of datasets from the perspective of three techniques of interpretation: genealogy, problematization, and hermeneutics.

Speaker: Razvan Amironesei, Ph.D., Applied Data Ethicist and Visiting Researcher, Google

5. Evaluating, Interpreting, and Monitoring Machine Learning Models

Machine learning models have caused a revolution in several fields, including, search and recommendation, finance, healthcare, and also fundamental sciences. Unfortunately, much of this progress has come with machine learning models getting more complex and opaque. Despite widespread deployment, the practice of evaluating models remains limited to computing aggregate metrics on held-out test sets. This session will argue how this practice can fall short of surfacing failure modes of the model that may otherwise show up during real-world usage.

Speaker: Ankur Taly, PhD, Staff Research Scientist, Google

ODSC West Conference (November 1st - 3rd)

200 speakers, 80 training sessions and workshops