Consists of 14-part ON-DEMAND training modules, this course provides a comprehensive overview of all of the subjects --across mathematics, statistics, and computer science --that underlie contemporary machine learning approaches, including deep learning and other artificial intelligence techniques.

If you use high-level software libraries (e.g., scikit-learn, Keras, TensorFlow, PyTorch) to train or deploy machine learning algorithms, and would like now to understand the fundamentals underlying the abstractions, enabling you to expand your capabilities.

Jon Krohn is Chief Data Scientist at the machine learning company, Untap.  He authored the 2019 book Deep Learning Illustrated, an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy. Jon holds a Ph.D. in Neuroscience from Oxford and has been publishing on machine learning In leading academic journals since 2010; his papers have been cited over a thousand times.

## BOOTCAMP OVERVIEW BY DR. JON KROHN

1. Linear Algebra Course (3 modules)

• Intro to Linear Algebra

• Linear Algebra II: Matrix Operations

2.  Calculus Course (4 modules)

• Calculus I: Limits & Derivatives

• Calculus II: Partial Derivatives & Integrals

3. Probability and Statistics Course (4 modules)

• Probability and Information Theory

• Intro to Statistics

4.  Computer Science (3 modules)

• Algorithms and Data Structures

• Optimization

### 1. Data Structures for Algebra

• What Linear Algebra Is
• A Brief History of Algebra
• Tensors
• Scalars
• Vectors and Vector Transposition
• Norms and Unit Vectors
• Basis, Orthogonal, and Orthonormal Vectors
• Arrays in NumPy
• Matrices
• Tensors in TensorFlow and PyTorch

## 2. Common Tensor Operations

• Tensor Transposition
• Basic Tensor Arithmetic
• Reduction
• The Dot Product
• Solving Linear Systems

### 3. Matrix Properties

• The Frobenius Norm
• Matrix Multiplication
• Symmetric and Identity Matrices
• Matrix Inversion
• Diagonal Matrices
• Orthogonal Matrices

4. Eigendecomposition

• Eigenvectors
• Eigenvalues
• Matrix Determinants
• Matrix Decomposition
• Application of Eigendecomposition

5. Matrix Operations for Machine Learning

• Singular Value Decomposition (SVD)
• The Moore-Penrose Pseudoinverse
• The Trace Operator
• Principal Component Analysis (PCA): A Simple Machine Learning Algorithm
• Resources for Further Study of Linear Algebra

1. Limits

• What calculus is
• A Brief History of Calculus
• The Method of Exhaustion
• Matrix Decomposition
• Application of Eigendecomposition

2. Computing Derivatives with Differentiation

• The Delta Method
• Basic Derivative Properties
• The Power Rule
• The Sum Rule
• The Product Rule
• The Quotient Rule
• The Chain Rule

3. Automatic Differentiation

• AutoDiff with Pytorch
• AutoDiff with TensorFlow 2
• Relating Differentiation to Machine Learning
• Cost (or Loss) Functions
• The Future: Differentiable Programming

4. Gradients Applied to Machine Learning

• Partial Derivatives of Multivariate Functions
• The Partial-Derivative Chain Rule
• Cost (or Loss) Functions
• Backpropagation
• Higher-Order Partial Derivatives

5. Integrals

• Binary Classification
• The Confusion Matrix
• The Receiver-Operating Characteristic (ROC) Curve
• Calculating Integrals Manually
• Numeric Integration with Python
• Finding the Area Under the ROC Curve
• Resources for Further Study of Calculus

1. Introduction to Probability

• What Probability Theory Is
• A Brief History: Frequentists vs Bayesians
• Applications of Probability to Machine Learning
• Random Variables
• Discrete vs Continuous Variables
• Probability Mass and Probability Density Function
• Expected Value
• Measures of Central Tendency: Mean, Median, and Mode
• Quantiles: Quartiles, Deciles, and Percentiles
• The Box-and-Whisker Plot
• Measures of Dispersion: Variance, Standard Deviation, and Standard Error
• Measures of Relatedness: Covariance and Correlation
• Marginal and Conditional Probabilities
• Independence and Conditional Independence

2. Distribution in Machine Learning

Uniforms

• Gaussian: Normal and Standard Normal
• The Central Limit Theorem
• Log-Normal
• Binominal and Multinomial
• Poisson
• Mixture Distributions
• Preprocessing Data for Model Input

3. Information Theory

• What Information Theory Is
• Self-Information
• Nats, Bits and Shannons
• Shannon and Differential Entropy
• Kullback-Leibler Divergence
• Cross-Entropy

4. Frequentist Statistics

• Frequentist vs Bayesian Statistics
• Review of Relevant Probability Theory
• z-scores and Outliers
• p-values
• Comparing Means with t-tests
• Confidence Intervals
• ANOVA: Analysis of Variance
• Pearson Correlation Coefficient
• R-Squared Coefficient of Determination
• Correlation vs Causation
• Correcting for Multiple Comparisons

5. Regression

• Features: Independent vs Dependent Variables
• Linear Regression to Predict Continuous Values
• Fitting a Line to Points on a Cartesian Plane
• Ordinary Least Squares
• Logistic Regression to Predict Categories
• (Deep) ML vs Frequentist Statistics

6. Bayesian Statistics

• When to use Bayesian Statistics
• Prior Probabilities
• Bayes’ Theorem
• PyMC3 Notebook
• Resources for Further Study of Probability and Statistics

1. Introduction to Data Structures and Algorithms

• A Brief History of Data
• A Brief History of Algorithms
• “Big O” Notation for Time and Space Complexity

2. Lists and Dictionaries

• List-Based Data Structures: Arrays, Linked Lists, Stacks, Queues, and Deques
• Searching and Sorting: Binary, Bubble, Merge, and Quick
• Set-Based Data Structures: Maps and Dictionaries
• Hashing: Hash Tables, Load Factors, and Hash Maps

3. Trees and Graphs

• Trees: Decision Trees, Random Forests, and Gradient-Boosting (XGBoost)
• Graphs: Terminology, Directed Acyclic Graphs (DAGs)
• Resources for Further Study of Data Structures & Algorithms

4. The Machine Learning Approach to Optimization

• The Statistical Approach to Regression: Ordinary Least Squares
• When Statistical Approaches to Optimization Breakdown
• The Machine Learning Solution

• Objective Functions
• Cost / Loss / Error Functions
• Minimizing Cost with Gradient Descent
• Learning Rate
• Critical Points, incl. Saddle Points
• Gradient Descent from Scratch with PyTorch
• The Global Minimum and Local Minima
• Mini-Batches and Stochastic Gradient Descent (SGD)
• Learning Rate Scheduling
• Maximizing Reward with Gradient Ascent

6. Fancy Deep Learning Optimizers

• A Layer of Artificial Neurons in PyTorch
• Jacobian Matrices
• Hessian Matrices and Second-Order Optimization
• Momentum
• Nesterov Momentum