Deep Learning – Spring 2021

 

Class Hours

Tue:  5:30 pm — 7:00 pm,  Thur: 7:15 pm — 8:45 pm
Location LT:1

Office Hours and Contact Info.

Instructor: Mohsen Ali
Office Hours; to be announced
Email: mohsen.ali@itu.edu.pk

Teaching Assistant 1: Obaid Ullah Ahmad
Office Hours: TBA
Email: Obaidullah.ahmad@itu.edu.pk

Teaching Assistant 2: Hafiz Muhammad Abdullah Zia
Office Hours: TBA
Email: msds19087@itu.edu.pk

Teaching Assistant 3: Muhammad Hamad Akram
Office Hours: TBA
Email: msee19018@itu.edu.pk

Course Basics

Core Course
Credit Hours: 3
Being offered to both MSDS, MSCS, and BS students
Practical and hands-on approach
5 to 6 programming assignments

Prerequisite

Enthusiasm, Energy, and Imagination
Data Structures, Probability & Statistics, Linear Algebra and basic Calculus
Programming skills and desire to read & implement.

 

Course Overview

We are going to take the “get your hands dirty” approach, you will be given assignments and projects to implement ideas discussed in the class. Projects and assignments will contain miniature versions of real-life existing applications and problems (e.g can you train your computer to generate dialogues in Shakespeare style or convert your image into painting as done by Monet, sentiment analysis, etc.. ).
The course will concentrate on developing both mathematical knowledge and implementation capabilities. We will start from training a single perceptron, move to training a deep neural network, study why training large networks is a problem and what could be its possible solutions. After dipping our toes in deep belief networks and recurrent neural network we will start looking into applications of deep learning in three different areas, text-analysis, speech processing, and computer vision. The objective of this approach is to make you comfortable enough that you can understand various research problems and, if interested, can implement deep learning-based applications.

Course Objectives

In the last few years, machine learning has matured from science fiction to reality. We are living in a world where we have already seen industry bringing to reality self-driving cars, face-recognizers that work on a massive scale (Facebook), speech translation systems that can translate from one language to many other simultaneously and in real-time, and more interestingly we have machines that can learn to play atari games in a similar fashion as we do.
A lot of these victories have come from the exciting field of Deep Learning; a learning methodology based on the concept that the human mind captures details at multiple levels or at multiple abstract levels. One property of deep learning is removing the responsibility of humans to design features, instead, Deep Learning is given a task to find the appropriate representation.

Grading Policy

  • 45% Assignments
  • 5% Class participation and Creating Notes
  • 20% Final Project
  • 10% Quizzes
  • 10% Midterm Exam
  • 10% Final Exam

Honor Code

All cases of academic misconduct will be forwarded to the disciplinary committee. All assignments are group-based unless explicitly specified by the instructor. In the words of Efros, let’s not embarrass ourselves.

Tentative and Rough Course Outline

 

Weeks Topics Evaluations
1 Introduction to Deep Learning

Difference between Machine Learning and Deep Learning

Basic Machine Learning: Linear & Logistic Regression,

2 Supervised Learning with Neural Networks

Deep Learning, Single and Multi-Layer Neural Networks, Perceptron Rule, Gradient Descent, Backpropagation, Loss Functions

Tutorial 1: Python/Numpy Tutorial/Anaconda

3 Hyperparameters tuning, Regularization and Optimization

Parameters vs Hyperparameters, Why regularization reduces overfitting? Data Augmentation, Vanishing/Exploding gradients, Weight Initialization Methods, Optimizers

Tutorial 2: Building a Linear Classifier

4 Convolutional Neural Networks

Convolutional Filters, Pooling Layers, Classic CNNs: AlexNet, VGG, GoogleNet, ResNet, DenseNet. Transfer Learning

Tutorial 4: CNN Visualization

5 Deep Learning for Vision Problems

Object Localization & Detection, Bounding box predictions, Anchor boxes, Region Proposal Networks, Detection Algorithms: RCNN, Faster RCNN, Yolo, SSD.

Tutorial 5: Caffe & Object Detection

6 Sequence Models

Recurrent Neural Networks (RNN), Gate Recurrent Unit (GRU), Long Short Term Memory (LSTM), Bidirectional RNN, Backpropagation through time. Image Caption Generation, Machine Translation, Text Generation & Summarization and Transformers

Tutorial 6: Image Captioning & Text Generation

7 Auto-Encoders & Generative Models

Variational Auto-Encoders, Stacked Auto-Encoders, Denoising Auto-Encoders, Concept of Generative Adversarial Networks (GANs)

8 Miscellaneous

Capsule Networks, Convolutional LSTM, Attention Networks, Restricted Boltzmann Machine, One-Shot Learning, Siamese Networks, Triplet Loss, Graph CNN, Approximate and Energy-Efficient Design for Deep CNN (Dr. Rehan Hafiz)

 

Course Notes

 

Topics Notes / Reading Material / Comments News
09th Mar 2021 Introduction 45% Assignments

20% Final Project

5% Class participation and Creating Notes

10% Quizzes

10% Midterm Exam

10% Final Exam

Recommended Resources
  • Text Book
    • Deep Learning by Ian Goodfellow Link
    • Dive into Deep Learning by Aston Zhang and co. Link
  • Recommended Online Books
    • Machine Learning, Oxford – Nando de Freitas Link
    • Convolutional Neural Networks for Visual Recognition, Stanford (cs231n) Link
    • A curated list of courses (Recommended) Link
    • Deep Learning for Natural Language Processing, Stanford Link
  • Video Lectures
    • Essence of Neural Networks – 3Blue1Brown Link
    • Convolutional Neural Networks for Visual Recognition, Stanford (cs231n) – Video Lectures Link
    • Neural Networks and Deep Learning – deeplearning.ai – Link
11th Mar 2021 Un-Supervised and Supervised Learning

Linear Regression

  • Introduction
  • Learning: Supervised, unsupervised. 
  • Learning: discontinuous, continuous
  • Models: bias and variance
  • Linear Regression
    • Error Function in Linear Regression

Assigned Readings:

Recommended Readings:

Refresh:

Concepts of local minima, local maxima, convex functions, concave functions, critical points, chain rule, saddle point

16th Mar 2021 No Class Makeup class will be announced soon
18th Mar 2021 No Class Makeup class(es) will take place on Saturday 27th March, 2021
23rd Mar 2021 No Class on account of Pakistan Resolution Day  Makeup class will be announced soon
25th Mar 2021 Linear Regression
    • Linear Regression
      • Where to use the Machine Learning? 
      • What basic points to ask?
        • Problem / objective
        • Data
        • Model that satisfies the objective
          • Parameters of the model
        • Loss function 
        • How to optimize the loss function.
      • Error Function in Linear Regression
      • What are the convex functions?
      • Concept of local and global minima & maxima, saddle point
      • Optimization 
        • Differentiation/gradient
        • Critical points
        • Differentiation in single variable case
        • More than one variable
          • Hessian, 
          • semi-positive definite
          • How to find maxima? 

    Assigned Readings:

    Recommended Readings:

Assignment 1
27th Mar 2021 Optimization, Gradient Descent and Logistic Regression
30th Mar 2021 Logistic Regression, Classification, Loss Functions
  • Classification
    • Linear Classification
    • Binary Classification using logistic regression
  • Logistic Regression
    • Squashing function
    • Sigmoid  
    • Cross Entropy loss function
  • Classification and its link with probability
  • Maximum likelihood Estimation
  • Multi-class classification
    • Softmax

Take home task 

  • Compute gradient of the cross entropy loss function with soft-max. 
  • Is cross-entropy loss function with soft-max a convex function

Assigned Readings:

Recommended Reading: Softmax and Cross Entropy

Assigned Readings (Linear Algebra):

Linear Algebra for Machine Learning

1st Apr 2021 Multi Class Classification, Optimization and Gradient Descent
  • Multi-class classification
    • Softmax and cross entropy loss.
    • Why is softmax used but not sigmoid?
  • Optimization of loss function
    • Learning Rate 
    • Momentum
  • Gradient Descent
    • Stochastic Gradient Descent
    • Batch Gradient descent 
    • Minibatch Gradient Descent 
  • Gradient Descent Optimization Algorithms
    • Momentum
    • Adagrad
    • Adadelta
    • RMSprop
    • Adam
    • AdaMax
  • Visualization of Algorithms
  • Which optimizer to use?

Recommended Readings:

Recommended Readings:

Gradient Descent: Video Lecture from Coursera, Andrew NG

2nd Apr 2021 Neural Networks
  • Feed-forward Neural Networks
    • Perceptron
      • OR-Function
      • AND-Function 
      • XOR-Function
    • Multiple Perceptron
    • Multiple Layer Neural Network
      • Nonlinear classification (circle)
      • Role of activation functions 
        • hard-limit, sigmoid, tanh, ReLu, leaky ReLu, MaxOut, ELU
    • Input, output and hidden layers
    • Why do we need non-linear activation functions?
    • Forward pass as matrix multiplication.
    • Decision Boundaries

Assigned Readings:

  • Chapter 6: Deep Forward Networks; Book: Deep Learning by Ian

Upto complete section 6.3.

Recommended Readings:

Perceptron Rule

3rd Apr 2021 Backpropagation
  • How to determine Weights?
  • Chain Rule
  • Back-propagation Algorithm
  • Training Neural Networks

Assigned Readings:

  • Chapter 6: Deep Forward Networks; Book: Deep Learning by Ian Goodfellow. Section 6.5 complete.

Recommended Readings

Optional:

How to do backpropagation in a brain by Hinton

Video Lecture:

Lecture 4: Backpropagation; Dhruv Batra 

Lecture 10 – Neural Networks, Yaser Abu-Mostafa.

6th Apr 2021 Neural Network Training
  • How to determine Weights?
  • Chain Rule
  • Back-propagation Algorithm
  • Weights Initialization Techniques
  • Activation Functions
    • Relu
    • Sigmoid
    • Tanh
    • Swish
    • Leaky Relu
    • Elu

Assigned Readings:

Assignment 2
8th Apr 2021 Weight Initialization and Batch Normalization
  • Weight Initialization
    • Random Initialization.
    • Initialize Weights using Gaussian Distribution.
    • If you are using ReLU activation use Kaiming/MSRA method.
    • If you are using tanh activation use Xavier Initialization method.
  • Covariate Shift
    • During training updating weights of layers affects distribution of each layer’s inputs. This is also called covariate shift. 
  • Batch Normalization.
    • It used to tackle covariate shift problems in data.
    • Improves gradient flow.
    • Makes deeper networks to train much easier.
    • Allow higher learning rates, faster convergence

Assigned Readings

Recommended Reading:

13th Apr 2021 Regularization and Dropout
  • Capacity of Network
  • Overfitting
  • How to avoid overfitting
  • Regularization
    • Bias/Variance, overfitting, underfitting
    • L2  /L1 regularization
  • Drop Out 
    • Concept of stochastic regularization
    • Forward / backward
  • Help Network in converging.
    • Data Augmentation
    • Learning Rate decay

Home Work:

  • Nuclear Norm
  • L0 norm

Assigned Readings:

Video Lectures

  • Deep Learning – Lecture 4 – Nando de Freitas – Link

Recommended Readings:

15th Apr 2021 Texture and Convolution filters
  • Texture
  • Texture/ pattern
  • Why is texture important?
  • How to represent texture?
  • Texture can be useful in image recognition
  • Convolution
    • Convolution of two signals in continuous time
    • Convolution in discrete time
    • Convolution in image processing
  • Image Filtering
  • Applying filter on images using convolution
  • Box filter
  • Sharpening Filters
20th Apr 2021 Filters and Convolutional Neural Networks
  • Image Gradient
  • Derivative Filters
  • Double Derivative filters
  • Gaussian Filters
    • Blob detection
  • Filter Banks
    • Filters to detect edges
    • Filters to detect bars
    • Filters to detect blobs
  • What does CNNs learn?
  • Convolution is shift Invariant
  • Convolutional Neural Networks
    • Why do we need convolutional neural networks?
    • Building blocks of CNN
      • Convolutional Layer
      • Pooling Layer
      • Activation Function
        • ReLU

NOTES

Assigned Readings

Recommended Readings

22th Apr 2021 CNNs
  • Making Deep neural Networks using convolution and pooling layers.
  • Hyper parameters in CNN
    • Number of layers
    • Size of features
    • Pooling window size
    • Stride
    • Number of neurons in fully connected layers
  • What is a Receptive Field?
    • How it is important for object recognition
    • Relationship with filter size, depth
    • How does a receptive field affect accuracy?
  • Increasing receptive field
    • Using dilated convolutions
    • Max Pooling
  • 1×1 Convolution 
    • Feature Fusion,
    • Dimensionality reduction or bottleneck layer.  

Assigned Readings

27th Apr 2021 Back Propagation in CNNs
  • Backpropagation for Convolutional layer
  • Backpropagation for Pooling layer
  • Converting a FC layer to the Convolutional Layer

Reference 

Video Lecture:

Assignment 3 Deliverable 2
29th Apr 2021 Transfer Learning
  • Transfer Learning 
    • Layers and hierarchical feature learning
    • Freezing and Fine-tuning 
  • Le-Net, AlexNet, VGG, Inception Net, ResNet
  • Larger Models and Parameters 
    • Larger models means more number of layers and deeper models.
    • More parameters mean we need large data to train and overfitting if we do not have enough data.
    • Larger models have greater learning capacity
    • Large number of parameters also increase chances of overfitting.
  • Link with receptive field
    • 1*1 convolution is used for the receptive field to stay the same.
Assignment 3 Deliverable 1
4th May 2021 Transfer Learning and CNN Architectures
  • Padding types
  • Vanishing and exploding gradient problem in training of large networks.
  • Use of residual Blocks to help in flow of gradients.
    • Residual Blocks
    • Skip layers
  • Using bottle neck layer to reduce computation
  • Different Network Types
    • AlexNet
    • VGG
    • Inception
    • ResNet
    • DenseNet
    • GoogLeNet
  • Semantic segmentation
    • Groups of pixels of same class
    • It is classification problem at pixel level
  • Fully connected convolutional networks

Assigned Reading

Recommended Readings

6th May 2021 Semantic Segmentation
  • CNN for Semantic Segmentation
    • Deconvolution 
    • Upscaling-convolution
    • Fully Convolutional Network
  • Semantic segmentation
    • Groups of pixels of same class
    • It is classification problem at pixel level
  • Object Detection
  • Classification

Assigned Reading

Recommended Readings

18th May 2021 Localization and Object Detection
  • Localization
    • How localization is different from classification
    • Localization as regression problem
    • Bounding box regression
    • Classification head 
    • Regression head
  • Object Detection
    • Object detection as classification and localization
    • Using regression and localization head for object detection
    • Intersection over Union
    • Sliding window method
    • Region Proposal method
  • Non Maximum Suppression
  • Selective search
  • Single stage detectors
    • Yolo
    • SSD
  • Two stage detector
    • RCNN
    • MaskRCNN
  • Interesting Project Topics
    • Transformers
    • Graph Neural Networks
    • Self-supervised learning
    • Semi-Supervised Learning
    • Zero shot Learning
    • Few Shot Learning

Recommended Readings

Yolo, SSD, RCNN, Mask RCNN, Transformers Paper

20th May 2021 Object Detection, Recurrent Neural Networks
  • Sliding Window approach without anchors.
  • Sliding Window approach with anchors. 
  • Two-stage detectors
    • Fast RCNN
    • Faster RCNN
  • Single Stage detectors
    • Yolo
    • SDD
    • Retina net
  • Feature Pyramid Networks
  • Negative Hard Mining
  • Sequence Modelling
    • Many to one e.g Sentiment Classification
    • Many to many e.g. Translation
    • One to many e.g image to caption
  • Recurrent Neural Networks
    • Output dependent on current input and previous hidden state
    • Backpropagation through time
      • Vanishing/Exploding gradients
      • Gated cells allow previous hidden states to bypass current cell

Assigned Reading

Sequence Modeling: Recurrent and Recursive Nets, Chapter 10 from textbook.  

Recommended Readings

Faster RCNN, Fast RCNN, Retina net

Text Book

  • Text Book: Deep Learning by Ian Goodfellow Link
  • Reference Book: Dive into Deep Learning by Aston Zhang and co Link

Recommended Readings

Following are recommended for reading.

Toolkits 

PyTorch

Top Conferences to Follow

  • International Conference on Machine Learning (ICML)
  • Conference on Neural Information Processing Systems (NIPS)
  • International Joint Conference on Artificial Intelligence (ICAI)
  • Conference on Computer Vision and Pattern Recognition (CVPR)
  • International Conference on Computer Vision (ICCV)
  • British Machine Vision Conference (BMVC)

Assignments

  • Assignment 1: Classification of MNIST Digits Using Pytorch:

  • Assignment 2: Implementation of Neural Network:

Some Interesting Links

Projects

The project list of the students will be shared here.