MIT CSAIL

6.S898 Deep Learning

Fall 2021

[ Schedule | Piazza | Canvas ]

Zoom link for class


Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, transformers), backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics. Each lecture will be from a different invited expert in the field.

Pre-requisites: 6.036, 6.041 or 6.042, 18.06; basically, if you have taken an intro course on ML, at the level of 6.036 or beyond, then you should be in good shape.

Note: This is a graduate-level course of 3-0-3 units. To access the Zoom link, you will need an account with MIT / Harvard / Wellesley email. For non-students who want access to Piazza or Canvas, email the TA to be added manually. For non-MIT students, refer to cross-registeration. The class will both be in person and streamed live over zoom.




Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Mon 3:00pm-4:00pm. (Zoom)

TA Minyoung Huh (Jacob)

minhuh at mit dot edu

OH: Thu 3:00pm-4:00pm. (Zoom)

- Logistics

- Grading Policy

  • 5% participation
  • 60% weekly questions
  • 35% final project
  •  



    Class Schedule


    ** class schedule is subject to change **

    Date Topics Speaker Course Materials Assignments
    Week 1: Introduction to neural networks
    Thu 9/9 Course overview, introduction to deep neural networks and their basic building blocks Phillip Isola slides
    notes
    Week 2: Differentiable programming and rethinking generalization
    Tue 9/14 SGD, Backprop and autodiff, differentiable programming Phillip Isola slides
    HW 1 released
    Thu 9/16 Generalization in neural networks: classical and modern perspectives
    + abstract We will start by briefly discussing the classical approach to generalization bounds, large margin theory, and complexity of neural networks. We then discuss recent interpolation results, the double or multiple-descent phenomenon, and the linear regime in overparametrized neural networks.
    Alexander Rakhlin slides Assigned readings:
    - Double descent
    - Benign overfitting (proofs optional)

    Optional readings:
    - Deep double descent
    - Rethinking generalization
    Week 3: Generalization and approximation theory
    Tue 9/21 Out-of-distribution generalization, bias, adversaries, and robustness
    + abstract Our current machine learning models achieve impressive performance on many benchmark tasks. Yet, these models can become remarkably brittle and susceptible to manipulation when deployed in the real world. Why is this the case? In this lecture, we take a closer look at this question, and pinpoint some of the roots of this observed brittleness. Specifically, we discuss how the way current ML models “learn” might not necessarily align with our expectations, and then outline possible approaches to alleviate this misalignment.
    Aleksander Madry slides HW 1 due (1pm) / HW 2 released

    Assigned readings:
    - Adversarial examples are not bugs (blog)

    Optional readings:
    - From ImageNet to Image Classification (blog)
    - Noise or signal (blog)
    Thu 9/23 Approximation theory
    + abstract How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?
    Ankur Moitra notes 18.408 Assigned readings:
    - Universal approximation bounds (proofs optional; focus on Sections I and II)

    Optional readings:
    - Benefits of depth
    - Neural Tangent Kernels
    - Power of depth
    - NNs are universal approximators
    Week 4: Deep neural architectures
    Tue 9/28 CNNs, Transformers, Resnets, and Encoder-decoders
    + abstract In this lecture we will cover convolutional neural networks, covering the ideas of filtering, patch-wise processing, multiscale representations, encoder-decoder architectures, residual connections, and self-attention layers. We will use image and video modeling problems as a lens onto when and why these architectures are useful.
    Phillip Isola HW 2 due (1pm) / HW 3 released

    Assigned readings:
    - AlexNet

    Optional readings:
    - Residual networks
    - Non-local networks
    - Vision transformers (read `Attention is all you need' first)
    Thu 9/30 RNNs, feedback, memory models, and sequence models
    + abstract In this lecture we will learn about recurrent neural networks and attention, which are fundamental building blocks of natural language processing systems. We will motivate these modules with two standard tasks: language modeling and machine translation.
    Yoon Kim Assigned readings:
    - Neural Machine Translation
    - Attention is all you need (blog)

    Optional readings:
    - Seq2Seq
    - Probabilistic Language Model
    - Difficulty of training RNNs
    - Understanding RNNs
    - Effectiveness of RNNs
    Week 5: Hacker’s guide
    Tue 10/5 How to be an effective deep learning researcher / engineer Dylan Hadfield-Menell
    Thu 10/7 Visualization and interpretation of deep embeddings David Bau
    Week 6: Generative modeling and representation learning
    Tue 10/12 Generative models Phillip Isola
    Thu 10/14 Self-supervised Scene Representation Learning Vincent Sitzmann
    Week 7: Generative modeling and representation learning
    Tue 10/19 Cross-modal learning, self-supervision Jim Glass
    Thu 10/21 Contrastive learning / metric learning Suvrit Sra
    Week 8: Deep nets as priors
    Tue 10/26 Finetuning, transfer learning, distillation; "foundation models" Phillip Isola
    Thu 10/28 Meta-learning Iddo Drori
    Week 9: Deep learning on structured data
    Tue 11/2 Graph neural networks Stefanie Jegelka
    Thu 11/4 Geometric deep learning Chris Scarvelis
    Week 10: Hardware for deep learning
    Tue 11/9 Hardware architectures for deep learning Joel Emer
    Thu 11/11 🎖️ No Class - Veterans Day 🎖️
    Week 11: Deep reinforcement learning
    Tue 11/16 Control, policy gradient, Q-learning Abhishek Gupta
    Thu 11/18 Deep RL for robotics Pulkit Agrawal
    Week 12: Neurosymbolic systems
    Tue 11/23 Compositionality and structured generalization Jacob Andreas
    Thu 11/25 🦃 No Class - Thanksgiving 🦃
    Week 13: Sociotechnical problems
    Tue 11/30 AI fairness TBD
    Thu 12/2 Industry TBD
    Week 14: Additional topics
    Tue 12/7 TBD TBD
    Thu 12/9 Drug discovery Wengong Jin