Course Curriculum
- 15 sections
- 190 lectures
- 00:00:00 total length
-
Modeling an epidemic
00:08:00 -
The machine learning recipe
00:06:00 -
The components of a machine learning model
00:02:00 -
Why model?
00:03:00 -
On assumptions and can we get rid of them?
00:09:00 -
The case of AlphaZero
00:11:00 -
Overfitting/underfitting/bias/variance
00:11:00 -
Why use machine learning
00:05:00
-
The InsureMe challenge
00:06:00 -
Supervised learning
00:05:00 -
Linear assumption
00:03:00 -
Linear regression template
00:07:00 -
Non-linear vs proportional vs linear
00:05:00 -
Linear regression template revisited
00:04:00 -
Loss function
00:03:00 -
Training algorithm
00:08:00 -
Code time
00:15:00 -
R squared
00:06:00 -
Why use a linear model?
00:04:00
-
Introduction to scaling
00:06:00 -
Min-max scaling
00:03:00 -
Code time (min-max scaling)
00:09:00 -
The problem with min-max scaling
00:03:00 -
What’s your IQ?
00:11:00 -
Standard scaling
00:04:00 -
Code time (standard scaling)
00:02:00 -
Model before and after scaling
00:05:00 -
Inference time
00:07:00 -
Pipelines
00:03:00 -
Code time (pipelines)
00:05:00
-
Spurious correlations
00:04:00 -
L2 regularization
00:10:00 -
Code time (L2 regularization)
00:05:00 -
L2 results
00:02:00 -
L1 regularization
00:06:00 -
Code time (L1 regularization)
00:04:00 -
L1 results
00:02:00 -
Why does L1 encourage zeros?
00:09:00 -
L1 vs L2: Which one is best?
00:01:00
-
Introduction to validation
00:02:00 -
Why not evaluate model on training data
00:06:00 -
The validation set
00:05:00 -
Code time (validation set)
00:08:00 -
Error curves
00:06:00 -
Model selection
00:06:00 -
The problem with model selection
00:06:00 -
Tainted validation set
00:05:00 -
Monkeys with typewriters
00:03:00 -
My own validation epic fail
00:07:00 -
The test set
00:06:00 -
What if the model doesn’t pass the test?
00:05:00 -
How not to be fooled by randomness
00:02:00 -
Cross-validation
00:04:00 -
Code time (cross validation)
00:07:00 -
Cross-validation results summary
00:02:00 -
AutoML
00:05:00 -
Is AutoML a good idea?
00:05:00 -
Red flags: Don’t do this!
00:07:00 -
Red flags summary and what to do instead
00:05:00 -
Your job as a data scientist
00:03:00
-
Intro and recap
00:02:00 -
Mistake #1: Data leakage
00:05:00 -
The golden rule
00:04:00 -
Helpful trick (feature importance)
00:02:00 -
Real example of data leakage (part 1)
00:05:00 -
Real example of data leakage (part 2)
00:05:00 -
Another (funny) example of data leakage
00:02:00 -
Mistake #2: Random split of dependent data
00:05:00 -
Another example (insurance data)
00:05:00 -
Mistake #3: Look-Ahead Bias
00:06:00 -
Example solutions to Look-Ahead Bias
00:02:00 -
Consequences of Look-Ahead Bias
00:02:00 -
How to split data to avoid Look-Ahead Bias
00:03:00 -
Cross-validation with temporally related data
00:03:00 -
Mistake #4: Building model for one thing, using it for something else
00:04:00 -
Sketchy rationale
00:06:00 -
Why this matters for your career and job search
00:04:00
-
Classifying images of handwritten digits
00:07:00 -
Why the usual regression doesn’t work
00:04:00 -
Machine learning recipe recap
00:02:00 -
Logistic model template (binary)
00:13:00 -
Decision function and boundary (binary)
00:05:00 -
Logistic model template (multiclass)
00:14:00 -
Decision function and boundary (multi-class)
00:01:00 -
Summary: binary vs multiclass
00:01:00 -
Code time!
00:20:00 -
Why the logistic model is often called logistic regression
00:05:00 -
One vs Rest, One vs One
00:05:00
-
Where we’re at
00:02:00 -
Brier score and why it doesn’t work
00:06:00 -
The likelihood function
00:11:00 -
Optimization task and numerical stability
00:03:00 -
Let’s improve the loss function
00:09:00 -
Loss value examples
00:05:00 -
Adding regularization
00:02:00 -
Binary cross-entropy loss
00:03:00
-
Recap
00:03:00 -
No closed-form solution
00:02:00 -
Naive algorithm
00:04:00 -
Fog analogy
00:05:00 -
Gradient descent overview
00:03:00 -
The gradient
00:06:00 -
Numerical calculation
00:02:00 -
Parameter update
00:04:00 -
Convergence
00:02:00 -
Analytical solution
00:02:00 -
[Optional] Interpreting analytical solution
00:05:00 -
Gradient descent conditions
00:03:00 -
Beyond vanilla gradient descent
00:03:00 -
Code time
00:07:00 -
Reading the documentation
00:11:00
-
Binary classification and class imbalance
00:06:00 -
Assessing performance
00:04:00 -
Accuracy
00:07:00 -
Accuracy with different class importance
00:04:00 -
Precision and Recall
00:07:00 -
Sensitivity and Specificity
00:03:00 -
F-measure and other combined metrics
00:05:00 -
ROC curve
00:07:00 -
Area under the ROC curve
00:06:00 -
Custom metric (important stuff!)
00:06:00 -
Other custom metrics
00:03:00 -
Bad data science process
00:04:00 -
Data rebalancing (avoid doing this!)
00:06:00 -
Stratified split
00:03:00
-
The inverted MNIST dataset
00:04:00 -
The problem with linear models
00:05:00 -
Neurons
00:03:00 -
Multi-layer perceptron (MLP) for binary classification
00:05:00 -
MLP for regression
00:02:00 -
MLP for multi-class classification
00:01:00 -
Hidden layers
00:01:00 -
Activation functions
00:03:00 -
Decision boundary
00:02:00 -
Loss function
00:03:00 -
Intro to neural network training
00:03:00 -
Parameter initialization
00:03:00 -
Saturation
00:05:00 -
Non-convexity
00:04:00 -
Stochastic gradient descent (SGD)
00:05:00 -
More on SGD
00:07:00 -
Code time!
00:13:00 -
Backpropagation
00:11:00 -
The problem with MLPs
00:04:00 -
Deep learning
00:09:00
-
Decision trees
00:04:00 -
Building decision trees
00:09:00 -
Stopping tree growth
00:03:00 -
Pros and cons of decision trees
00:08:00 -
Decision trees for classification
00:07:00 -
Decision boundary
00:01:00 -
Bagging
00:04:00 -
Random forests
00:06:00 -
Gradient-boosted trees for regression
00:07:00 -
Gradient-boosted trees for classification [optional]
00:04:00 -
How to use gradient-boosted trees
00:03:00
-
Nearest neighbor classification
00:03:00 -
K nearest neighbors
00:03:00 -
Disadvantages of k-NN
00:04:00 -
Recommendation systems (collaborative filtering)
00:03:00 -
Introduction to Support Vector Machines (SVMs)
00:05:00 -
Maximum margin
00:02:00 -
Soft margin
00:02:00 -
SVM vs Logistic Model (support vectors)
00:03:00 -
Alternative SVM formulation
00:06:00 -
Dot product
00:02:00 -
Non-linearly separable data
00:03:00 -
Kernel trick (polynomial)
00:10:00 -
RBF kernel
00:02:00 -
SVM remarks
00:06:00
-
Intro to unsupervised learning
00:01:00 -
Clustering
00:03:00 -
K-means clustering
00:10:00 -
K-means application example
00:03:00 -
Elbow method
00:02:00 -
Clustering remarks
00:07:00 -
Intro to dimensionality reduction
00:05:00 -
PCA (principal component analysis)
00:08:00 -
PCA remarks
00:03:00 -
Code time (PCA)
00:13:00
-
Missing data
00:02:00 -
Imputation
00:04:00 -
Imputer within pipeline
00:04:00 -
One-Hot encoding
00:05:00 -
Ordinal encoding
00:03:00 -
How to combine pipelines
00:04:00 -
Code sample
00:08:00 -
Feature Engineering
00:07:00 -
Features for Natural Language Processing (NLP)
00:11:00 -
Anatomy of a Data Science Project
00:01:00