6.S898 Deep Learning

Fall 2022

[ Schedule | Policies | Piazza | Canvas | Previous years ]

Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, transformers), backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.

Pre-requisites: (6.3900 [6.036] or 6.C01 or 6.3720 [6.401]) and (6.3700[6.041] or 6.3800 [6.008] or 18.05) and (18.C06 or 18.06)

Note: This is course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. For non-students who want access to Piazza or Canvas, email Aidan Curtis ( to be added manually. For non-MIT students, refer to cross-registeration.

Lectures will be in-person only; if there is an important reason you cannot make class, you may email Aidan Curtis ( to get a recording.

Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Thu 2:30pm-3:30pm (2-146).

Instructor Stefanie Jegelka

stefje at csail dot mit dot edu

OH: Thu 2:30pm-3:30pm (2-146).

TA Tongzhou Wang

tongzhou at mit dot edu

OH: Mon 1:00pm-2:00pm (24-317).

TA Aidan Curtis

curtisa at mit dot edu

OH: Mon 1:00pm-2:00pm (24-317).

- Logistics

- Grading Policy

  • 5% participation
  • 65% problem sets
  • 30% final project
  • Collaboration policy
  • Late policy

    Class Schedule

    ** class schedule is subject to change **

    Date Topics Speaker Course Materials Assignments
    Week 1
    Thu 9/8 Course overview, introduction to deep neural networks and their basic building blocks Phillip Isola & Stefanie Jegelka slides
    Week 2
    Tue 9/13 How to train a neural net
    + details SGD, Backprop and autodiff, differentiable programming
    Phillip Isola slides
    pset 1 out
    Thu 9/15 Approximation theory
    + details How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?
    Stefanie Jegelka slides
    Week 3
    Tue 9/20 Generalization theory (IID)
    + details We will start by briefly discussing the classical approach to generalization bounds, large margin theory, and complexity of neural networks. We then discuss recent interpolation results, the double or multiple-descent phenomenon, and the linear regime in overparametrized neural networks.
    Stefanie Jegelka slides
    double descent
    + optional notes    deep double descent
       benign overfit
       simple case (§2.3)
    Thu 9/22 Architectures -- Grids
    + details CNNs
    Phillip Isola slides
    Week 4
    Tue 9/27 Architectures -- Graphs
    + details GNNs
    Stefanie Jegelka slides
       GNNs (§5, optional §7.3)
       representation power of GNNs
    + optional notes    GNN intro part 1 part 2
       Neural message passing for quantum chemistry (§2)
       GNN representation theory
    pset 1 due
    pset 2 out
    Thu 9/29 Geometric deep learning
    + details Inductive biases of archs, invariances and equivariances
    Stefanie Jegelka slides
    Geometric DL (§3, rest optional)
    + optional notes    Geometric DL blog
       Deep sets
       Equivariance in DL
    Week 5
    Tue 10/4 Hacker's guide to DL
    + details In this lecture, we'll discuss the practical side of developing deep learning systems. We will focus on best practices, common mistakes to look for, and evaluation methods for developing deep learning models. While optimization methods and software design practices for Deep Learning are still under development, this lecture will try to present several tried and true implementation and debugging strategies for diagnosing failures in model training and help make model training less painful in the future.
    Phillip Isola slides
    Thu 10/6 Architectures -- transformers
    + details Transformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPS, GNNs, and CNNs -- they are all variations on the same themes!
    Phillip Isola slides
    Week 6
    Wed 10/12
    pset 3 out
    Thu 10/13 Architectures -- memory
    + details RNNs, LSTMs, memory, sequence models.
    Phillip Isola slides
    Week 7
    Tue 10/18 Representation learning -- reconstruction-based
    + details Intro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses.
    Phillip Isola slides
    notes (optional)
    Thu 10/20 Representation learning -- similarity-based
    + details In this lecture, we will talk about unsupervised and weakly supervised learning, primarily through the lens of similarity driven learning. I’ll briefly talk about metric learning first, before moving onto self-supervised learning with a focus on contrastive learning (the modern cousin of metric learning).
    Stefanie Jegelka notes
    contrastive feature geometry (align+uniform)
    contrastive learning
    pset 2 due
    pset 3 due
    pset 4 out
    Week 8
    Tue 10/25 Representation learning -- theory
    + details
    Stefanie Jegelka slides
    inductive bias (negative results; optional)
    simplicity bias (low-rank; optional)
    pitfalls of simplicity bias (optional)
    + more optional notes    simplicity bias (SGD; function complexity)
       factors affecting features
       shortcuts in contrastive learning
    Thu 10/27 DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
    [Guest Lecture]
    Gabriele Corso slides
    paper (optional)
    Week 9
    Tue 11/1 Generative models -- basics
    + details Density and energy models, samplers, GANs, autoregressive models, diffusion models
    Phillip Isola slides
    denoising diffusion (optional) diffusion blog (optional)
    Thu 11/3 Generative models -- representation learning meets generative modeling
    + details VAEs, latent variables
    Phillip Isola slides
    pset 4 due
    pset 5 out project handout
    Week 10
    Tue 11/8 Generative models --- conditional models
    + details cGAN, cVAE, paired and unpaired translation, image-to-image, text-to-image, world models
    Phillip Isola slides
    Thu 11/10 Generalization (OOD)
    + details
    Stefanie Jegelka slides
    notes (adv. examples)
    notes (robust opt.)
    notes (shortcuts in NNs; optional)
    notes (extrapolation; optional)
    Fri 11/11 project proposal due
    Week 11
    Tue 11/15 Transfer learning -- models
    + details Finetuning, linear probes, knowledge distillation, foundation models
    Phillip Isola slides
    notes (foundation models; optional)
    Thu 11/17 Transfer learning -- data
    + details Generative models as data++, domain adaptation, prompting
    Phillip Isola slides
    notes (MAML)
    notes (DatasetGAN)
    pset 5 due
    Week 12
    Tue 11/22 Scaling laws
    + details
    Stefanie Jegelka slides
    Week 13
    Tue 11/29 Curiosities about NN optimization and stability
    + details
    Stefanie Jegelka slides
    notes (edge of stability; optional)
    notes (unstable convergence; optional)
    notes (stability; optional)
    notes (stability of SGD; optional)
    notes (convergence to invariant measure; Sec. 1-3; optional)
    notes (statistical algorithmic stability; Sec. 1-3; optional)
    Thu 12/1 Energy-efficient deep learning
    + details
    Vivienne Sze slides
    Week 14
    Tue 12/6 Toward Responsibly-Deployable Deep Learning
    + details
    Tom Hartvigsen
    Thu 12/8 No lecture
    OH at the usual lecture location and hour
    Week 15
    Tue 12/13 Poster session (1pm to 3pm)
    Grier Room (34-401)
    Final project (blog + poster) due

    Collaboration policy

    Late policy

    Additionally, you have 3 slack days for the semester. We will waive up to 3 days worth of late penalties. This only waives existing late penalties - it CANNOT be used to extend a PSet past a week! We will automatically choose which assignments the late days get applied to, to maximize your grade.

    Previous years

    Fall 2021