# Fast.ai Practical deep learning for coders Part 1 2018 course overview

The purpose of this post is to highlight the contents of the fast.ai 2018 Practical Deep learning for coders Part 1. I hope anybody who decides to take this course or is having second thoughts about it will benefit from this. Note that the content structure is extracted from the video timelines provided in the wiki link for each lesson.

**Lesson 1: Recognising cats and dogs**

**Lesson wiki:** http://forums.fast.ai/t/wiki-lesson-1/9398

- The “Top-Down” approach to study, vs the “Bottom-Up”
- Jupyter Notebook lesson1.ipynb ‘Dogs vs Cats’
- Running the first Deep Learning model with the ‘resnet34’ architecture, epoch, accuracy on validation set.
- Analyzing results: looking at pictures
- What is Deep Learning ?
- The Universal Approximation Theorem, and examples used by Google corporation.
- What is actually going on in a Deep Learning model, with convolutional network.
- Adding a Non-Linear Layer to our model, sigmoid or ReLu (rectified linear unit), SGD (Stochastic Gradient Descent)
- Visualizing and Understanding Convolutional Networks
- Cyclical learning rates with Fastai library as “lr_find” or learning rate finder.

**Lesson 2: Convolutional Neural Networks**

**Lesson wiki:** http://forums.fast.ai/t/wiki-lesson-2/9399

- Lesson 1 review, Image classifier
- What is a Learning Rate (LR), LR Finder, mini-batch, ‘learn.sched.plot_lr()’ & ‘learn.sched.plot()’, ADAM optimizer intro
- How to improve your model with more data,

avoid overfitting, use different data augmentation - Data Augmentation (DA), ‘tfms=’ and ‘precompute=True’, visual examples of Layer detection and activation in pre-trained

networks like ImageNet. - Learning rate annealing, cosine annealing, Stochastic Gradient Descent (SGD) with Restart approach, Ensemble; “Jeremy’s superpower”
- Save your model weights with ‘learn.save()’ & ‘learn.load()’, the folders ‘tmp’ & ‘models’
- Fine-tuning and differential learning rate
- Why Fast.ai switched from Keras+TensorFlow to PyTorch, creating a high-level library on top.
- Classification or Confusion matrix
- Download/import data from Kaggle with ‘kaggle-cli’, using CSV files with Pandas.
- Undocumented Pro-Tip from Jeremy: train on a small size, then use ‘learn.set_data()’ with a larger data set (like 299 over 224 pixels)
- Using Test Time Augmentation
- Amazon Satellite imagery competition on Kaggle.

**Lesson 3: Improving your image classifier**

**Lesson Wiki:** http://forums.fast.ai/t/wiki-lesson-3/9401

- How to complete the Dog breeds detection assignment.
- What does it mean for “Precompute = True” and “learn.bn_freeze”
- Intro & comparison to Keras with TensorFlow
- Porting PyTorch fast.ai library to Keras+TensorFlow project
- The theory behind Convolutional Networks, and Otavio Good demo (Word Lens)
- ConvNet demo with Excel, filter, Hidden layer, Maxpool, Dense weights, Fully-Connected layer, output, probabilities adding to 1, activation function, Softmax
- Multi-label classification with Amazon Satellite competition
- Seting different learning rates for different layers
- ‘sigmoid’ activation for multi-label
- Training only the last layers, not the initial freeze/frozen ones from ImageNet models
- Working with Structured Data “Corporacion Favorita Grocery Sales Forecasting” based on the Rossman Stores competition
- Split Rossman columns in two types: categorical vs continuous

**Lesson 4: Structured, Time series and Language models**

**Lesson wiki: **http://forums.fast.ai/t/wiki-lesson-4/9402

- Dropout discussion with “Dog_Breeds”
- Why monitor the Loss / LogLoss vs Accuracy?
- Looking at Structured and Time Series data with Rossmann Kaggle competition
- RMSPE: Root Mean Square Percentage Error
- Dealing with categorical variables
- Intro to Natural Language Processing (NLP)
- Creating a Language Model with IMDB dataset
- Tokenize: splitting a sentence into an array of tokens

# Lesson 5: Collaborative filtering, Inside the training loop

**Lesson wiki: **http://forums.fast.ai/t/wiki-lesson-5/9403

- MovieLens dataset: build an effective collaborative filtering model from scratch
- Why a matrix factorization and not a neural net ? Using Excel solver for Gradient Descent ‘GRG Nonlinear’
- Kaiming He Initialization (via DeepGrid)
- Improving the MovieLens model in Excel again, adding a constant for movies and users called “a bias”
- Squeashing the ratings between 1 and 5, with Sigmoid function
- What is happening inside the “Training Loop”
- Spreadsheet ‘Momentum’ tab
- Spreadsheet ‘Adam’ tab
- Beyond Dropout: ‘Weight-decay’ or L2 regularization

**Lesson 6: Interpreting Embeddings, RNN from scratch**

**Lesson wiki: **http://forums.fast.ai/t/wiki-lesson-6/9404

- Embedding interpration, using ‘PCA’ from ‘sklearn.decomposition’ for Linear Algebra
- Looking at the “Rossmann Retail / Store” Kaggle competition with the ‘Entity Embeddings of Categorical Variables’ paper.
- “Rossmann” Data Cleaning / Feature Engineering
- How to write something that is different than Fastai library
- More into SGD with ‘lesson6-sgd.ipynb’ notebook, a Linear Regression problem with continuous outputs. ‘a*x+b’ & mean squared error (MSE) loss function with ‘y_hat’
- Gradient Descent implemented in PyTorch, ‘loss.backward()’, ‘.grad.data.zero_()’ in ‘optim.sgd’ class
- Gradient Descent with Numpy
- Basic NN with single hidden layer (rectangle, arrow, circle, triangle), by Jeremy, Image CNN with single dense hidden layer.
- RNN with PyTorch, question: “What the hidden state represents ?”
- Multi-output model
- Sequence length vs batch size
- The Identity Matrix (init!), a paper from Geoffrey Hinton “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units”

**Lesson 7: Resnets from scratch**

**Lesson wiki: **http://forums.fast.ai/t/wiki-lesson-7/9405

- Building the RNN model with ‘self.init_hidden(bs)’ and ‘self.h’, the “back prop through time (BPTT)” approach
- Creating mini-batches
- How to create Nietzsche training/validation data
- Dealing with PyTorch not accepting a “Rank 3 Tensor”, only Rank 2 or 4, ‘F.log_softmax()’
- Intro to GRU cell (RNNCell has gradient explosion problem — i.e. you need to use low learning rate and small BPTT)
- Long Short Term Memory (LSTM), ‘LayerOptimizer()’, Cosine Annealing ‘CosAnneal()’
- Computer Vision with CIFAR 10 and ‘lesson7-cifar10.ipynb’ notebook, Why study research on CIFAR 10 vs ImageNet vs MNIST
- Looking at a Fully Connected Model, based on a notebook from student ‘Kerem Turgutlu’, then a CNN model (with Excel demo)
- Refactored the model with new class ‘ConvLayer()’ and ‘padding’
- Using Batch Normalization (BatchNorm) to make the model more resilient, ‘BnLayer()’ and ‘ConvBnNet()’
- Deep BatchNorm
- Replace the model with ResNet, class ‘ResnetLayer()’, using ‘boosting’
- ‘Bottleneck’ layer with ‘BnLayer()’, ‘ResNet 2’ with ‘Resnet2()’, Skipping Connections.
- Class Activation Maps (CAM) of ‘Dogs v Cats’
- Questions to Jeremy: “Your journey into Deep Learning” and “How to keep up with important research for practitioners”, “If you intend to come to Part 2, you are expected to master all the techniques in Part 1”, Jeremy’s advice to master Part 1.

**Call to action:**

If you did find this post helpful, then please hold the clap button for as long as you feel like. I definitely recommend this to course to anybody who wants to get started or improve in deep learning. The course content and learning approach is remarkable, hands-on and fun. I have published and intend to publish more posts regarding this course, so stay tuned and check them out.