Course Curriculum
- 15 sections
- 190 lectures
- 15 hours, 50 minutes total length
-
Modeling an epidemic00:08:00
-
The machine learning recipe00:06:00
-
The components of a machine learning model00:02:00
-
Why model?00:03:00
-
On assumptions and can we get rid of them?00:09:00
-
The case of AlphaZero00:11:00
-
Overfitting/underfitting/bias/variance00:11:00
-
Why use machine learning00:05:00
-
The InsureMe challenge00:06:00
-
Supervised learning00:05:00
-
Linear assumption00:03:00
-
Linear regression template00:07:00
-
Non-linear vs proportional vs linear00:05:00
-
Linear regression template revisited00:04:00
-
Loss function00:03:00
-
Training algorithm00:08:00
-
Code time00:15:00
-
R squared00:06:00
-
Why use a linear model?00:04:00
-
Introduction to scaling00:06:00
-
Min-max scaling00:03:00
-
Code time (min-max scaling)00:09:00
-
The problem with min-max scaling00:03:00
-
What’s your IQ?00:11:00
-
Standard scaling00:04:00
-
Code time (standard scaling)00:02:00
-
Model before and after scaling00:05:00
-
Inference time00:07:00
-
Pipelines00:03:00
-
Code time (pipelines)00:05:00
-
Spurious correlations00:04:00
-
L2 regularization00:10:00
-
Code time (L2 regularization)00:05:00
-
L2 results00:02:00
-
L1 regularization00:06:00
-
Code time (L1 regularization)00:04:00
-
L1 results00:02:00
-
Why does L1 encourage zeros?00:09:00
-
L1 vs L2: Which one is best?00:01:00
-
Introduction to validation00:02:00
-
Why not evaluate model on training data00:06:00
-
The validation set00:05:00
-
Code time (validation set)00:08:00
-
Error curves00:06:00
-
Model selection00:06:00
-
The problem with model selection00:06:00
-
Tainted validation set00:05:00
-
Monkeys with typewriters00:03:00
-
My own validation epic fail00:07:00
-
The test set00:06:00
-
What if the model doesn’t pass the test?00:05:00
-
How not to be fooled by randomness00:02:00
-
Cross-validation00:04:00
-
Code time (cross validation)00:07:00
-
Cross-validation results summary00:02:00
-
AutoML00:05:00
-
Is AutoML a good idea?00:05:00
-
Red flags: Don’t do this!00:07:00
-
Red flags summary and what to do instead00:05:00
-
Your job as a data scientist00:03:00
-
Intro and recap00:02:00
-
Mistake #1: Data leakage00:05:00
-
The golden rule00:04:00
-
Helpful trick (feature importance)00:02:00
-
Real example of data leakage (part 1)00:05:00
-
Real example of data leakage (part 2)00:05:00
-
Another (funny) example of data leakage00:02:00
-
Mistake #2: Random split of dependent data00:05:00
-
Another example (insurance data)00:05:00
-
Mistake #3: Look-Ahead Bias00:06:00
-
Example solutions to Look-Ahead Bias00:02:00
-
Consequences of Look-Ahead Bias00:02:00
-
How to split data to avoid Look-Ahead Bias00:03:00
-
Cross-validation with temporally related data00:03:00
-
Mistake #4: Building model for one thing, using it for something else00:04:00
-
Sketchy rationale00:06:00
-
Why this matters for your career and job search00:04:00
-
Classifying images of handwritten digits00:07:00
-
Why the usual regression doesn’t work00:04:00
-
Machine learning recipe recap00:02:00
-
Logistic model template (binary)00:13:00
-
Decision function and boundary (binary)00:05:00
-
Logistic model template (multiclass)00:14:00
-
Decision function and boundary (multi-class)00:01:00
-
Summary: binary vs multiclass00:01:00
-
Code time!00:20:00
-
Why the logistic model is often called logistic regression00:05:00
-
One vs Rest, One vs One00:05:00
-
Where we’re at00:02:00
-
Brier score and why it doesn’t work00:06:00
-
The likelihood function00:11:00
-
Optimization task and numerical stability00:03:00
-
Let’s improve the loss function00:09:00
-
Loss value examples00:05:00
-
Adding regularization00:02:00
-
Binary cross-entropy loss00:03:00
-
Recap00:03:00
-
No closed-form solution00:02:00
-
Naive algorithm00:04:00
-
Fog analogy00:05:00
-
Gradient descent overview00:03:00
-
The gradient00:06:00
-
Numerical calculation00:02:00
-
Parameter update00:04:00
-
Convergence00:02:00
-
Analytical solution00:02:00
-
[Optional] Interpreting analytical solution00:05:00
-
Gradient descent conditions00:03:00
-
Beyond vanilla gradient descent00:03:00
-
Code time00:07:00
-
Reading the documentation00:11:00
-
Binary classification and class imbalance00:06:00
-
Assessing performance00:04:00
-
Accuracy00:07:00
-
Accuracy with different class importance00:04:00
-
Precision and Recall00:07:00
-
Sensitivity and Specificity00:03:00
-
F-measure and other combined metrics00:05:00
-
ROC curve00:07:00
-
Area under the ROC curve00:06:00
-
Custom metric (important stuff!)00:06:00
-
Other custom metrics00:03:00
-
Bad data science process00:04:00
-
Data rebalancing (avoid doing this!)00:06:00
-
Stratified split00:03:00
-
The inverted MNIST dataset00:04:00
-
The problem with linear models00:05:00
-
Neurons00:03:00
-
Multi-layer perceptron (MLP) for binary classification00:05:00
-
MLP for regression00:02:00
-
MLP for multi-class classification00:01:00
-
Hidden layers00:01:00
-
Activation functions00:03:00
-
Decision boundary00:02:00
-
Loss function00:03:00
-
Intro to neural network training00:03:00
-
Parameter initialization00:03:00
-
Saturation00:05:00
-
Non-convexity00:04:00
-
Stochastic gradient descent (SGD)00:05:00
-
More on SGD00:07:00
-
Code time!00:13:00
-
Backpropagation00:11:00
-
The problem with MLPs00:04:00
-
Deep learning00:09:00
-
Decision trees00:04:00
-
Building decision trees00:09:00
-
Stopping tree growth00:03:00
-
Pros and cons of decision trees00:08:00
-
Decision trees for classification00:07:00
-
Decision boundary00:01:00
-
Bagging00:04:00
-
Random forests00:06:00
-
Gradient-boosted trees for regression00:07:00
-
Gradient-boosted trees for classification [optional]00:04:00
-
How to use gradient-boosted trees00:03:00
-
Nearest neighbor classification00:03:00
-
K nearest neighbors00:03:00
-
Disadvantages of k-NN00:04:00
-
Recommendation systems (collaborative filtering)00:03:00
-
Introduction to Support Vector Machines (SVMs)00:05:00
-
Maximum margin00:02:00
-
Soft margin00:02:00
-
SVM vs Logistic Model (support vectors)00:03:00
-
Alternative SVM formulation00:06:00
-
Dot product00:02:00
-
Non-linearly separable data00:03:00
-
Kernel trick (polynomial)00:10:00
-
RBF kernel00:02:00
-
SVM remarks00:06:00
-
Intro to unsupervised learning00:01:00
-
Clustering00:03:00
-
K-means clustering00:10:00
-
K-means application example00:03:00
-
Elbow method00:02:00
-
Clustering remarks00:07:00
-
Intro to dimensionality reduction00:05:00
-
PCA (principal component analysis)00:08:00
-
PCA remarks00:03:00
-
Code time (PCA)00:13:00
-
Missing data00:02:00
-
Imputation00:04:00
-
Imputer within pipeline00:04:00
-
One-Hot encoding00:05:00
-
Ordinal encoding00:03:00
-
How to combine pipelines00:04:00
-
Code sample00:08:00
-
Feature Engineering00:07:00
-
Features for Natural Language Processing (NLP)00:11:00
-
Anatomy of a Data Science Project00:01:00