ODSC Conference Courses

Learn from the best ODSC conference speakers. Create your free account and start your learning journey!

1. DS/AI for Incident Response & Threat Hunting with CHRYSALIS & DAISY

There is a lot of talk about the use of AI in Cybersecurity these days. Lots of cybersecurity vendors claim that their products use AI for detecting and stopping threats, but very little information is available on how they do it.
 
Talking specifically about Incident Response and Threat Hunting... What does it take to transform traditional Threat Hunters/Forensicators into AI-Enhanced ones so they can unleash the power of AI in their day to day investigations?
 
Discover in this talk by Senior DFIR SANS Instructor Jess Garcia how to transparently use AI in Incident Response and Threat Hunting with the help of the DS4N6 toolset (DAISY VM & CHRYSALIS) and learn about the most useful ML algorithms for this purpose.


Speaker: Jess Garcia, CEO, Security & Forensics Analyst, Incident Responder | Senior Instructor, One eSecurity | SANS Institute


2. Denoising Diffusion-based Generative Modeling

Diffusion-based generative models such as DALL·E 2 have achieved exceptional image generation quality. Unlike other generative models based on explicit representations of probability distributions (e.g., autoregressive) or implicit sampling procedures (e.g., GANs),  diffusion models learn directly the vector field of gradients of the data distribution (scores). This framework allows flexible architectures, requires no sampling during training or the use of adversarial training methods. These score-based generative models enable exact likelihood evaluation, achieve state-of-the-art sample quality, and can be used to improve performance in a variety of inverse problems, including medical imaging.


Speaker: Stefano Ermon, PhD, Assistant Professor, Stanford University


3. Orchestrating Data Assets instead of Tasks, with Dagster

Data practitioners use orchestrators to schedule and run the computations that keep data assets, like datasets and ML models, up-to-date.


Traditional orchestrators think in terms of “tasks”. This talk discusses an alternative, declarative approach to data orchestration that puts data assets at the center. This approach, called “software-defined assets”, is implemented in Dagster, an open source data orchestrator.


In traditional data platforms, code and data are only loosely coupled. As a consequence, deploying changes to data feels dangerous, backfills are error-prone and irreversible, and it’s difficult to trust data, because you don’t know where it comes from or how it’s intended to be maintained. Each time you run a job that mutates a data asset, you add a new variable to account for when debugging problems.


Dagster proposes an alternative approach to data management that tightly couples data assets to code - each table or ML model corresponds to the function that’s responsible for generating it. This results in a “Data as Code” approach that mimics the “Infrastructure as Code” approach that’s central to modern DevOps. Your git repo becomes your source of truth on your data, so pushing data changes feels as safe as pushing code changes. Backfills become easy to reason about. You trust your data assets because you know how they’re computed and can reproduce them at any time. The role of the orchestrator is to ensure that physical assets in the data warehouse match the logical assets that are defined in code, so each job run is a step towards order.


Asset-based orchestration works well with modern data stack tools like dbt, Meltano, Airbyte, and Fivetran, because those tools already think in terms of assets.
Attendees of this session will learn how to build and maintain data pipelines in a way that makes their datasets and ML models dramatically easier to trust and evolve.



Speaker: Sandy Ryza, Lead Engineer - Dagster Project, Elementl


4. An Intuition-Based Approach to Reinforcement Learning

Reinforcement learning (RL) has achieved remarkable success in various tasks, such as defeating all-human teams in MMP (massive multi-player) games, advances in robotics, and astonishing results in the protein folding problem in chemistry. Expertise in RL requires strong knowledge of machine learning, statistics, and areas of mathematics. Moreover, RL contains many concepts that seem "fuzzy" and hence can be challenging for beginners who are trying to learn RL. However, this session provides the intuition of various RL concepts, such as exploit/explore and maximization of expected reward, along with real-life examples of these concepts. 

Attendees will also see a comparison of greedy versus epsilon greedy, and why epsilon greedy can solve tasks that cannot be solved using a greedy approach. Some of the preceding concepts will be illustrated during the presentation of the n-chain task in RL, whose solution clearly requires an epsilon greedy algorithm. The target audience for this session is for beginners who have no experience with RL.


Speaker: Oswald Campesato, Founder | AI Adjunct Instructor, iQuarkt | UCSC


5. Cybersecurity and Policing in the Metaverse

Cybersecurity and policing in the metaverse.
You can buy virtual assets in the Metaverse; real estate, investment commodities, stock. This of course means that the Metaverse will need to have security and policing.

AI Play In Electric Vehicles
About Me
My R&D Lab near Silicon Valley
Al in Electric Vehicles.
Increasing Complexity of Mobility Devices.
Areas of AI Potentials in EVs.
Tesla Self Driving Sensor.
Tesla Self Driving.
EV Fuzzy Controller.
What is Fuzzy Logic?
Current R&D Project
Short Video of Car Project


Speaker: Jack McCauley, Board Trustee at University of California, Berkeley, Former co-founder and Engineer, Oculus VR. Faculty Member Jacobs Institute, McCauley Chair in Drug Policy Innovation at RAND Corporation, MSRI Trustee, Black Lab LLC

6. Riding the Tailwind of NLP Explosion

We ingest 2 million documents monthly at CB Insights (CBI) to empower tech decision-makers and researchers. From raw data to insights, the R&D team takes on many holy grail challenges, a major one being how to extract relevant information with scale, speed, and precision.

When we started at CBI, NLP was still prehistoric when the "bag of words" walked the earth. Fast forward ten years, the birth of the "attention mechanism" created an NLP explosion and a strong tailwind for teams big and small to ride.

In this talk, we'll share how we modernized our NLP stack @ CBI R&D and the challenges we met with. Part I will walk you through the timeline and milestones of NLP evolution, highlighting significant trends after the "attention" revolution. Part II will discuss battle-ready lessons gained using transformer models across various tasks and languages, leveraging open source libraries such as HuggingFace Transformers and Pytorch Lightning. 


Speaker: Rongyao Huang, Lead Data Scientist, CB Insights



7. Causal AI

Causal inference is increasingly an indispensable tool of data science, machine learning, and data-driven decision-making. In this talk I will present the state-of-play in causal machine learning.  I cover the problems that matter in practice, with emphasis on the tech and retail industries. I will also talk about trends in opensource tools for causal inference. Finally, I'll show examples from DoWhy and its sister package EconML, which together form the PyTorch of causal inference.


Speaker: Robert Osazuwa Ness, PhD, Senior Researcher, Microsoft



8. Emerging Approaches to AI Governance: Tech-Led vs Policy-Led

Over the past few years, many have become more familiar with the potential risks posed to the improper deployment and usage of AI/ML systems. Companies of almost all sizes and across almost all sectors have seen examples of major AI failures, leading into significant decay in trust of these systems. As a result, stakeholders across organizations have emerged as interested in remediating these risks and getting a handle on AI -- in owning AI governance. Some are drawn to technical capabilities which promise solutions to ethical problems and enable quality. Others rely on existing compliance and policy methods to enforce standards.  In this session, we will describe what these different approaches look like, the pros and cons of each, and considerations to build a robust framework around AI governance that engages technical, business, and compliance teams. 

Speaker: Ilana Golbin, Director, PwC Emerging Technologies and Responsible AI Lead



9. Cloud Directions, MLOps and Production Data Science

Cloud computing promises to simplify infrastructure, but somehow MLOps remains deeply technical, even in the cloud.  The complexity of MLOps tends to lead to an organizational antipattern: data scientists who know the data and models best have to mind-meld with data engineers who know the infrastructure best. This is particularly problematic in the highest-value stage of the ML lifecycle — managing models in production.

Recent trends in cloud technology, including serverless computing, promise new approaches for abstracting away infrastructure. Unfortunately these offerings fall short of the challenge of MLOps. In this talk I will cover some of the important promises and weaknesses of current cloud offerings, and describe research from Berkeley's RISElab and the resulting open source Aqueduct system, which are putting Production Data Science at the fingertips of anyone working with data and models.


Speaker: Joe Hellerstein, PhD, Jim Gray Professor of Computer Science, University of California, Berkeley



10. Robust and Equitable Uncertainty Estimation

Machine learning provides us with an amazing set of tools to make predictions, but how much should we trust particular predictions? To answer this, we need a way of estimating the confidence we should have in particular predictions of black-box models. Standard tools for doing this give guarantees that are averages over predictions. For instance, in a medical application, such tools might paper over poor performance on one medically relevant demographic group if it is made up for by higher performance on another group. Standard methods also depend on the data distribution being static — in other words, the future should be like the past.

In this lecture, we will describe a new technique to address both these problems: a way to produce prediction sets for arbitrary black-box prediction methods that have correct empirical coverage even when the data distribution might change in arbitrary, unanticipated ways and such that we have correct coverage even when we zoom in to focus on demographic groups that can be arbitrary and intersecting.

Speaker: Aaron Roth, PhD, Professor of Computer and Cognitive Science, University of Pennsylvania