ON-DEMAND: Expert-led training with Jon Krohn

Author of Deep Learning Illustrated

Consists of 14-part ON-DEMAND training modules, this course provides a comprehensive overview of all of the subjects --across mathematics, statistics, and computer science --that underlie contemporary machine learning approaches, including deep learning and other artificial intelligence techniques. 

Why Take this Course?

If you use high-level software libraries (e.g., scikit-learn, Keras, TensorFlow, PyTorch) to train or deploy machine learning algorithms, and would like now to understand the fundamentals underlying the abstractions, enabling you to expand your capabilities.   

Instructor's Bio: Dr. Jon Krohn

Jon Krohn is Chief Data Scientist at the machine learning company, Untap.  He authored the 2019 book Deep Learning Illustrated, an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy. Jon holds a Ph.D. in Neuroscience from Oxford and has been publishing on machine learning In leading academic journals since 2010; his papers have been cited over a thousand times. 

Course Outline

1. Linear Algebra Course (3 modules) 


  • Intro to Linear Algebra


  • Linear Algebra II: Matrix Operations


2.  Calculus Course (4 modules) 


  • Calculus I: Limits & Derivatives


  • Calculus II: Partial Derivatives & Integrals


3. Probability and Statistics Course (4 modules) 


  • Probability and Information Theory


  • Intro to Statistics


4.  Computer Science (3 modules) 


  • Algorithms and Data Structures


  • Optimization 


1. Data Structures for Algebra

  • What Linear Algebra Is
  • A Brief History of Algebra
  • Tensors
  • Scalars
  • Vectors and Vector Transposition
  • Norms and Unit Vectors
  • Basis, Orthogonal, and Orthonormal Vectors
  • Arrays in NumPy
  • Matrices
  • Tensors in TensorFlow and PyTorch

2. Common Tensor Operations

  • Tensor Transposition
  • Basic Tensor Arithmetic
  • Reduction
  • The Dot Product
  • Solving Linear Systems


3. Matrix Properties

  • The Frobenius Norm
  • Matrix Multiplication
  • Symmetric and Identity Matrices
  • Matrix Inversion
  • Diagonal Matrices
  • Orthogonal Matrices


4. Eigendecomposition

  • Eigenvectors
  • Eigenvalues
  • Matrix Determinants
  • Matrix Decomposition
  • Application of Eigendecomposition


5. Matrix Operations for Machine Learning

  • Singular Value Decomposition (SVD)
  • The Moore-Penrose Pseudoinverse
  • The Trace Operator
  • Principal Component Analysis (PCA): A Simple Machine Learning Algorithm
  • Resources for Further Study of Linear Algebra

1. Limits

  • What calculus is
  • A Brief History of Calculus
  • The Method of Exhaustion
  • Matrix Decomposition
  • Application of Eigendecomposition

 

2. Computing Derivatives with Differentiation

  • The Delta Method
  • Basic Derivative Properties
  • The Power Rule
  • The Sum Rule
  • The Product Rule
  • The Quotient Rule
  • The Chain Rule

 

3. Automatic Differentiation

  • AutoDiff with Pytorch
  • AutoDiff with TensorFlow 2
  • Relating Differentiation to Machine Learning
  • Cost (or Loss) Functions
  • The Future: Differentiable Programming

 

4. Gradients Applied to Machine Learning

  • Partial Derivatives of Multivariate Functions
  • The Partial-Derivative Chain Rule
  • Cost (or Loss) Functions
  • Gradients
  • Gradient Descent
  • Backpropagation
  • Higher-Order Partial Derivatives

 

5. Integrals

  • Binary Classification
  • The Confusion Matrix
  • The Receiver-Operating Characteristic (ROC) Curve
  • Calculating Integrals Manually
  • Numeric Integration with Python
  • Finding the Area Under the ROC Curve
  • Resources for Further Study of Calculus

1. Introduction to Probability

  • What Probability Theory Is
  • A Brief History: Frequentists vs Bayesians
  • Applications of Probability to Machine Learning
  • Random Variables
  • Discrete vs Continuous Variables
  • Probability Mass and Probability Density Function
  • Expected Value
  • Measures of Central Tendency: Mean, Median, and Mode
  • Quantiles: Quartiles, Deciles, and Percentiles
  • The Box-and-Whisker Plot
  • Measures of Dispersion: Variance, Standard Deviation, and Standard Error
  • Measures of Relatedness: Covariance and Correlation
  • Marginal and Conditional Probabilities
  • Independence and Conditional Independence  

 

 

2. Distribution in Machine Learning

Uniforms

  • Gaussian: Normal and Standard Normal
  • The Central Limit Theorem
  • Log-Normal
  • Binominal and Multinomial
  • Poisson
  • Mixture Distributions
  • Preprocessing Data for Model Input

 

3. Information Theory

  • What Information Theory Is
  • Self-Information
  • Nats, Bits and Shannons
  • Shannon and Differential Entropy
  • Kullback-Leibler Divergence
  • Cross-Entropy

 

4. Frequentist Statistics

  • Frequentist vs Bayesian Statistics
  • Review of Relevant Probability Theory
  • z-scores and Outliers
  • p-values
  • Comparing Means with t-tests
  • Confidence Intervals
  • ANOVA: Analysis of Variance
  • Pearson Correlation Coefficient
  • R-Squared Coefficient of Determination
  • Correlation vs Causation
  • Correcting for Multiple Comparisons

 

5. Regression

  • Features: Independent vs Dependent Variables
  • Linear Regression to Predict Continuous Values
  • Fitting a Line to Points on a Cartesian Plane
  • Ordinary Least Squares
  • Logistic Regression to Predict Categories
  • (Deep) ML vs Frequentist Statistics

 

6. Bayesian Statistics

  • When to use Bayesian Statistics
  • Prior Probabilities
  • Bayes’ Theorem
  • PyMC3 Notebook
  • Resources for Further Study of Probability and Statistics

1. Introduction to Data Structures and Algorithms

  • A Brief History of Data
  • A Brief History of Algorithms
  • “Big O” Notation for Time and Space Complexity

 

2. Lists and Dictionaries

  • List-Based Data Structures: Arrays, Linked Lists, Stacks, Queues, and Deques
  • Searching and Sorting: Binary, Bubble, Merge, and Quick
  • Set-Based Data Structures: Maps and Dictionaries
  • Hashing: Hash Tables, Load Factors, and Hash Maps

 

3. Trees and Graphs

  • Trees: Decision Trees, Random Forests, and Gradient-Boosting (XGBoost)
  • Graphs: Terminology, Directed Acyclic Graphs (DAGs)
  • Resources for Further Study of Data Structures & Algorithms

 

4. The Machine Learning Approach to Optimization

  • The Statistical Approach to Regression: Ordinary Least Squares
  • When Statistical Approaches to Optimization Breakdown
  • The Machine Learning Solution

 

5. Gradient Descent

  • Objective Functions
  • Cost / Loss / Error Functions
  • Minimizing Cost with Gradient Descent
  • Learning Rate
  • Critical Points, incl. Saddle Points
  • Gradient Descent from Scratch with PyTorch
  • The Global Minimum and Local Minima
  • Mini-Batches and Stochastic Gradient Descent (SGD)
  • Learning Rate Scheduling
  • Maximizing Reward with Gradient Ascent

 

6. Fancy Deep Learning Optimizers

  • A Layer of Artificial Neurons in PyTorch
  • Jacobian Matrices
  • Hessian Matrices and Second-Order Optimization
  • Momentum
  • Nesterov Momentum
  • AdaGrad
  • AdaDelta
  • RMSProp
  • Adam
  • Nadam
  • Training a Deep Neural Net
  • Resources for Further Study

Prerequisites

Programming: All code demos will be in Python, so experience with it or another object-oriented programming language would be helpful for following along with the code examples.


Mathematics: Familiarity with secondary school-level mathematics will make the class easier to follow along with. If you are comfortable dealing with quantitative information -- such as understanding charts and rearranging simple equations -- then you should be well prepared to follow along with all the mathematics.