Abstract

1. Probability & Information Theory


This class, Probability & Information Theory, introduces the mathematical fields that enable us to quantify uncertainty as well as to make predictions despite uncertainty. These fields are essential because machine learning algorithms are both trained by imperfect data and deployed into noisy, real-world scenarios they haven’t encountered before.

Through the measured exposition of theory paired with interactive examples, you’ll develop a working understanding of variables, probability distributions, metrics for assessing distributions, and graphical models. You’ll also learn how to use information theory to measure how much meaningful signal there is within some given data. The content covered in this class is itself foundational for several other classes in the Machine Learning Foundations series, especially Intro to Statistics and Optimization.


Over the course of studying this topic, you'll:

  • Develop an understanding of what’s going on beneath the hood of predictive statistical models and machine learning algorithms, including those used for deep learning.
  • Understand the appropriate variable type and probability distribution for representing a given class of data, as well as the standard techniques for assessing the relationships between distributions.
  • Apply information theory to quantify the proportion of valuable signal that’s present amongst the noise of a given probability distribution.


2. Intro to Statistics


This class, Intro to Statistics, builds on probability theory to enable us to quantify our confidence about how distributions of data are related to one another.

Through the measured exposition of theory paired with interactive examples, you’ll develop a working understanding of all of the essential statistical tests for assessing whether data are correlated with each other or sampled from different populations -- tests which frequently come in handy for critically evaluating the inputs and outputs of machine learning algorithms. You’ll also learn how to use regression to make predictions about the future based on training data.

The content covered in this class builds on the content of other classes in the Machine Learning Foundations series (linear algebra, calculus, and probability theory) and is itself foundational for the Optimization class.


Over the course of studying this topic, you'll:

  • Develop an understanding of what’s going on beneath the hood of predictive statistical models and machine learning algorithms, including those used for deep learning.
  • Hypothesize about and critically evaluate the inputs and outputs of machine learning algorithms using essential statistical tools such as the t-test, ANOVA, and R-squared.
  • Use historical data to predict the future using regression models that take advantage of frequentist statistical theory (for smaller data sets) and modern machine learning theory (for larger data sets), including why we may want to consider applying deep learning to a given problem.

DIFFICULTY LEVEL: BEGINNER

Instructor Bio:

Dr Jon Krohn

Chief Data Scientist, Author of Deep Learning Illustrated | untapt

Dr. Jon Krohn

Jon Krohn is Chief Data Scientist at the machine learning company untapt. He authored the 2019 book Deep Learning Illustrated, an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy. Jon holds a Ph.D. in neuroscience from Oxford and has been publishing on machine learning in leading academic journals since 2010; his papers have been cited over a thousand times.

Course Outline

1: Introduction to Probability

  • What Probability Theory Is
  • A Brief History: Frequentists vs Bayesians
  • Applications of Probability to Machine Learning
  • Random Variables
  • Discrete vs Continuous Variables
  • Probability Mass and Probability Density Functions
  • Expected Value
  • Measures of Central Tendency: Mean, Median, and Mode
  • Quantiles: Quartiles, Deciles, and Percentiles
  • The Box-and-Whisker Plot
  • Measures of Dispersion: Variance, Standard Deviation, and Standard Error
  • Measures of Relatedness: Covariance and Correlation
  • Marginal and Conditional Probabilities
  • Independence and Conditional Independence


2: Distributions in Machine Learning

  • Uniform
  • Gaussian: Normal and Standard Normal
  • The Central Limit Theorem
  • Log-Normal
  • Exponential and Laplace
  • Binomial and Multinomial
  • Poisson
  • Mixture Distributions
  • Preprocessing Data for Model Input


3: Information Theory

  • What Information Theory Is
  • Self-Information
  • Nats, Bits and Shannons
  • Shannon and Differential Entropy
  • Kullback-Leibler Divergence
  • Cross-Entropy


4: Frequentist Statistics

  • Frequentist vs Bayesian Statistics
  • Review of Relevant Probability Theory
  • z-scores and Outliers
  • p-values
  • Comparing Means with t-tests
  • Confidence Intervals
  • ANOVA: Analysis of Variance
  • Pearson Correlation Coefficient
  • R-Squared Coefficient of Determination
  • Correlation vs Causation
  • Correcting for Multiple Comparisons


5: Regression

  • Features: Independent vs Dependent Variables
  • Linear Regression to Predict Continuous Values
  • Fitting a Line to Points on a Cartesian Plane
  • Ordinary Least Squares
  • Logistic Regression to Predict Categories


6: Bayesian Statistics

  • (Deep) ML vs Frequentist Statistics
  • When to use Bayesian Statistics
  • Prior Probabilities
  • Bayes’ Theorem
  • PyMC3 Notebook
  • Resources for Further Study of Probability and Statistics