MIT EECS

6.S898 Deep Learning

Fall 2023

[ Schedule | Policies | Piazza | Canvas | Gradescope | Previous years ]

Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, graph nets, transformers), geometry and invariances in deep learning, backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.

Pre-requisites: (6.3900 [6.036] or 6.C01 or 6.3720 [6.401]) and (6.3700 [6.041] or 6.3800 [6.008] or 18.05) and (18.C06 or 18.06)

Note: This course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. For non-students who want access to Piazza or Canvas, email Anthea Li (yichenl@mit.edu) to be added manually. For non-MIT students, refer to cross-registration.




Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Mon. 2-3PM, Rm. 34-302

Instructor Sara Beery

beery at mit dot edu

OH: Tues. 9-10AM. Rm. 36-153

Instructor Jeremy Bernstein

jbernstein at mit dot edu

OH: Tues. 4-5PM, Rm. 34-302

TA Anthea Li

yichenl at mit dot edu

OH: Wed. 2-3PM, Rm. 24-308

TA Thien Le

thienle at mit dot edu

OH: Thurs. 3-4PM, Rm. 34-302

TA Saachi Jain

saachij at mit dot edu

OH: Wed. 10-11AM, Rm. 24-319

TA Veevee Cai

cail at mit dot edu

OH: Fri. 11-12PM, Rm. 24-319

TA Pratyusha Sharma

pratyusha at mit dot edu

OH: Wed. 9-10AM, Rm. 24-323

TA Jocelyn Shen

joceshen at mit dot edu

OH: Tues. 11:30-12:30PM, Rm. 24-323

- Logistics

- Grading Policy

  • 65% problem sets
  • 35% final project
  • Collaboration policy
  • AI assistants policy (ChatGPT, etc)
  • Late policy
  •  



    Class Schedule


    ** class schedule is subject to change **

    Date Topics Speaker Course Materials Assignments
    Week 1
    Thu 9/7 Course overview, introduction to deep neural networks and their basic building blocks Sara Beery slides

    notation for this course
    notes

    optional reading: Neural nets as distribution transformers
    Week 2
    Tue 9/12 How to train a neural net
    + details SGD, Backprop and autodiff, differentiable programming
    Sara Beery slides

    required reading: gradient-based learning
    required reading: backprop
    pset 1 out
    Tue 9/12 (5-6PM ET) PyTorch Tutorial
    Saachi Jain Tutorial link 32-D463 Star Room in Stata
    Wed 9/13 (10-11AM ET) PyTorch Tutorial
    Anthea Li Tutorial link 32-D463 Star Room in Stata
    Thu 9/14 Approximation theory
    + details How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?
    Phillip Isola slides

    optional reading: Deep learning theory notes sections 2 and 5 (this is written at a rather advanced level; try to get the intuitions rather than all the details)
    Week 3
    Tue 9/19 Architectures: Grids
    + details This lecture will focus mostly on convolutional neural networks, presenting them as a good choice when your data lies on a grid.
    Phillip Isola slides
    required reading: CNNs
    Thu 9/22 Architectures: Graphs
    + details This lecture covers graph neural networks (GNNs), showing connections to MLPs and CNNs and message passing algorithms. We will also discuss theoretical limitations on the expressive power of GNNs, and the practical implications of this."
    Phillip Isola slides
    required reading: Section 5 of GRL book
    optional reading: How Powerful are Graph Neural Networks
    optional reading: Distill blog on GNNS
    Week 4
    Tue 9/26 Scaling rules for optimisation
    + details Spectral perspective on neural computation. Feature learning and hyperparameter transfer. Scaling rules for hyperparameter transfer across width and depth.
    Jeremy Bernstein slides pset 1 due
    pset 2 out
    Thu 9/28 Bayesian analysis of learning and generalisation
    + details Over-parameterisation. Inadequacy of VC dimension. Bayesian perspective. PAC-Bayes theory.
    Jeremy Bernstein slides
    optional reading: Understanding deep learning requires rethinking generalization
    optional reading: Probable networks and plausible predictions
    Week 5
    Tue 10/3 Guest Lecture: Tess Smidt
    + details Symmetry can occur in many forms. For physical systems in 3D, we have the freedom to choose any coordinate system and therefore any physical property must transform predictably under elements of Euclidean symmetry (3D rotations, translations and inversion). For algorithms involving the nodes and edges of graphs, we have symmetry under permutation of how the nodes and edges are ordered in computer memory. Unless coded otherwise, machine learned models make no assumptions about the symmetry of a problem and will be sensitive to e.g. an arbitrary choice of coordinate system or ordering of nodes and edges in an array. One of the primary motivations of explicitly treating symmetry in machine learning models is to eliminate the need for data augmentation. Another motivation is that by encoding symmetry into a method, we get the guarantee that the model will give the "same" answer for an example and a "symmetrically equivalent" example even if the model was not explicitly trained on the "symmetrically equivalent" example. In this lecture, we will discuss several ways to make machine learning models "symmetry-aware" (e.g. input representation vs. loss vs. and model architecture). We will focus on how to handle 3D Euclidean symmetry and permutation symmetry in neural networks, describe unintuitive and beneficial consequences of these symmetries, and discuss how to set up training tasks that are compatible with your assumptions of symmetry.
    Tess Smidt slides
    Thu 10/5 Hacker's guide to DL
    + details
    Phillip Isola slides
    optional reading: Recipe's for training NNs
    optional reading: Rules of ML
    Week 6
    Tues 10/10 Student holiday
    Wed 10/11 pset 2 due
    pset 3 out
    Thu 10/12 Architectures -- transformers
    + details Transformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPS, GNNs, and CNNs -- they are all variations on the same themes!
    Sara Beery slides
    required reading: Transformers (note that this reading focuses on examples from vision but you can apply the same architecture to any kind of data)
    Week 7
    Tue 10/17 Architectures -- memory
    + details RNNs, LSTMs, memory, sequence models.
    Sara Beery slides
    required reading: RNNs
    optional reading: RNN Stability analysis and LSTMs
    Thu 10/19 Representation learning -- reconstruction-based
    + details Intro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses.
    Phillip Isola slides
    required reading: Representation learning
    optional reading: Representation learning: A review
    Fri 10/20 Final project proposal guidelines out
    Week 8
    Tue 10/24 Representation learning -- similarity-based
    + details In this lecture, we will talk about unsupervised and weakly supervised learning, primarily through the lens of similarity driven learning. I’ll briefly talk about metric learning first, before moving onto self-supervised learning with a focus on contrastive learning (the modern cousin of metric learning).
    Sara Beery slides
    required reading: (same as previous lecture)
    optional reading: Contrastive feature alignment
    optional reading: Contrastive learning
    pset 3 due
    pset 4 out
    Thu 10/26 Representation learning -- theory
    + details
    Jeremy Bernstein slides
    optional reading: Kernel methods for DL
    optional reading: DNN as Gaussian Processes
    Week 9
    Tue 10/31 Generative models -- basics
    + details Density and energy models, samplers, GANs, autoregressive models, diffusion models
    Phillip Isola slides
    required reading: Generative Models
    Thu 11/2 Generative models -- representation learning meets generative modeling
    + details VAEs, latent variables
    Phillip Isola slides

    required reading: Generative modeling meets representation learning
    optional reading: VAE paper
    Week 10
    Tue 11/7 Generative models --- conditional models
    + details cGAN, cVAE, paired and unpaired translation, image-to-image, text-to-image, world models
    Phillip Isola slides

    required reading: Conditional generative models
    pset 4 due
    pset 5 out
    Thu 11/9 Generalization (OOD)
    + details
    Sara Beery slides
    required reading: Adversarial examples
    required reading: Training robust classifiers
    required reading: WILDS: A Benchmark of in-the-Wild Distribution Shifts
    optional reading: Shortcuts in NN
    optional reading: From ImageNet to Image Classification
    optional reading: Noise or Signal
    optional reading: Extrapolation
    Fri 11/10 project proposal due @ 11:59PM EST
    Week 11
    Tue 11/14 Transfer learning -- models
    + details Finetuning, linear probes, knowledge distillation, foundation models
    Sara Beery slides
    required reading:Transfer learning and adaptation
    pset 5 due
    Thu 11/16 Transfer learning -- data
    + details Generative models as data++, domain adaptation, prompting
    Sara Beery slides
    required reading: (same as previous lecture)
    Week 12
    Tue 11/21 Guest lecture: Large Language Models
    + details
    Yoon Kim
    Tue 11/23 No class: Thanksgiving
    Week 13
    Tue 11/28 Scaling laws
    + details Scaling laws for different neural architectures, power laws, breaking power laws, theoretical underpinnings, critical batch size
    Phillip Isola slides

    required reading:Scaling Laws for Neural Language Models
    optional reading:Chinchilla scaling laws
    optional reading:Data manifold argument
    optional reading:Breaking power laws via data pruning
    optional reading:Critical batch size
    Thu 11/30 Automatic gradient descent
    + details Advanced tools for thinking about gradient descent on arbitrary computational graphs. Includes "metrisation" and "non-dimensionalisation" of neural architecture.
    Jeremy Bernstein slides
    Week 14
    Tue 12/5 Project office hours
    Thu 12/7 Deploying computer vision systems - A case study on birdsong identification
    + details TBA
    Grant Van Horn
    Week 15
    Tue 12/12 Past & future of deep learning
    + details Overview of key topics from the course; data and model scaling; LLMs and reasoning agents; automation and regulation
    Jeremy Bernstein Final project due


    Collaboration policy



    AI assistants policy



    Late policy