MIT EECS

6.7960 Deep Learning

Fall 2024

[ Schedule | Policies | Piazza | Canvas | Gradescope | Lecture Recordings | Previous years ]

Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, graph nets, transformers), geometry and invariances in deep learning, backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.

Pre-requisites: 18.05 and (6.3720, 6.3900, or 6.C01)

Note: This course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. Due to heavy enrollment, we will very unfortunately not be able to take cross-registrations this semester.




Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Mon 2-3pm 45-344

Instructor Sara Beery

beery at mit dot edu

OH: Mon 9-10am 45-344

Instructor Jeremy Bernstein

jbernstein at mit dot edu

OH: Weds 9-10am 45-344

TA Behrooz Tahmasebi

bzt at mit dot edu

OH: Fri 9-10am 45-344

TA Robert Calef

rcalef at mit dot edu

OH: Fri 3-4pm 45-344

TA Jamie Meindl

jmeindl at mit dot edu

OH: Weds 1-2pm 45-344

TA Sharut Gupta

sharut at mit dot edu

OH: Thurs 3-4pm 45-344

TA David Forman

formand at mit dot edu

OH: Tues 12-1pm 45-344

TA Sebastian Alberdi

salberdi at mit dot edu

OH: Thurs 12-1pm 45-344

TA Ishan Ganguly

iganguly at mit dot edu

OH: Thurs 5-6pm 24-112

TA Jeff Lai

clai24 at mit dot edu

OH: Tues 4-5pm 45-344

TA Isabella Yu

iyu at mit dot edu

OH: Tues 9:45am-10:45am 45-344

TA Elizaveta Tremsina

etrem at mit dot edu

OH: Mon 1-2pm 45-344

Course Assistant Taylor Braun

tvbraun at mit dot edu

- Logistics

- Grading Policy

  • 65% problem sets
  • 35% final project
  • Collaboration policy
  • AI assistants policy (ChatGPT, etc)
  • Attendance policy
  • Late policy
  • - Materials

     



    Class Schedule


    ** class schedule is subject to change **

    Date Topics Speaker Course Materials Assignments
    Week 1
    Thu 9/5 Course overview, introduction to deep neural networks and their basic building blocks Sara Beery slides

    required readings:
    notation for this course
    intro to neural networks

    optional readings:
    Neural nets as distribution transformers
    Week 2
    Tue 9/10 How to train a neural net
    + details SGD, Backprop and autodiff, differentiable programming
    Sara Beery slides

    required readings:
    gradient-based learning
    backprop
    pset 1 out
    Tue 9/10 (5:30-6:30PM ET) PyTorch Tutorial
    Jamie Meindl Tutorial link 45-230
    Wed 9/11 (11am-noon ET) PyTorch Tutorial
    Sharut Gupta Tutorial link 54-100
    Thu 9/12 Approximation theory
    + details How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?
    Jeremy Bernstein slides

    optional reading:
    Deep learning theory notes sections 2 and 5 (this is written at a rather advanced level; try to get the intuitions rather than all the details)
    Week 3
    Tue 9/17 Architectures: Grids
    + details This lecture will focus mostly on convolutional neural networks, presenting them as a good choice when your data lies on a grid.
    Sara Beery slides

    required reading:
    CNNs
    Thu 9/19 Architectures: Graphs
    + details This lecture covers graph neural networks (GNNs), showing connections to MLPs and CNNs and message passing algorithms. We will also discuss theoretical limitations on the expressive power of GNNs, and the practical implications of this."
    Phillip Isola slides

    required reading:
    Section 5 of GRL book (mainly focus on the content through 5.1)

    optional readings:
    How Powerful are Graph Neural Networks
    Distill blog on GNNS
    Week 4
    Tue 9/24 Generalization Theory
    + details Basic generalization theory. Overparameterization. Double descent. Inadequacy of VC dimension. Inductive biases in deep learning.
    Phillip Isola slides

    optional readings:
    Understanding deep learning requires rethinking generalization
    Double descent
    Probable networks and plausible predictions
    pset 1 due
    pset 2 out
    Thu 9/26 Scaling rules for optimisation
    + details Spectral perspective on neural computation. Feature learning and hyperparameter transfer. Scaling rules for hyperparameter transfer across width and depth.
    Jeremy Bernstein slides

    required reading:
    Steepest descent
    Week 5
    Tue 10/1 Architectures: Transformers
    + details Transformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPs, GNNs, and CNNs -- they are all variations on the same themes!
    Phillip Isola slides

    required reading:
    Transformers (note that this reading focuses on examples from vision but you can apply the same architecture to any kind of data)
    Thu 10/3 Hacker's guide to DL
    + details Practical tips mixed with opinionated anecdotes about how to get deep nets to actually do what you want.
    Phillip Isola slides

    optional readings:
    Recipe's for training NNs
    Rules of ML
    Week 6
    Tue 10/8 Architectures: Memory
    + details RNNs, LSTMs, memory, sequence models.
    Sara Beery slides

    required reading:
    RNNs

    optional reading:
    RNN Stability analysis and LSTMs
    pset 2 due
    pset 3 out
    Thu 10/10 Representation learning: Reconstruction-based
    + details Intro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses.
    Phillip Isola slides

    required reading:
    Representation learning

    optional reading:
    Representation learning: A review
    Week 7
    Tue 10/15 Student holiday
    Thu 10/17 Representation learning -- similarity-based
    + details Metric learning, contrastive learning, self-supervised and supervised variants, InfoNCE, alignment and uniformity, hard negatives.
    Sara Beery slides

    required reading: (same as previous lecture)

    optional readings:
    Alignment and uniformity
    Contrastive learning blog, covering lots of recent methods
    Week 8
    Tue 10/22 Representation learning -- theory
    + details A look at the inductive biases of architecture. Gaussian processes and the Neural Network--Gaussian Process correspondence
    Jeremy Bernstein slides

    optional readings:
    Kernel methods for DL
    DNN as Gaussian Processes
    pset 3 due
    pset 4 out

    Final project proposal guidelines out
    Thu 10/24 Generative models: Basics
    + details Density and energy models, samplers, GANs, autoregressive models, diffusion models
    Phillip Isola slides

    required reading:
    Generative Models
    Week 9
    Tue 10/29 Generative models: Representation learning meets generative modeling
    + details VAEs, latent variables
    Phillip Isola slides

    required reading:
    Generative modeling meets representation learning

    optional reading:
    VAE paper
    Thu 10/31 Generative models: Conditional models
    + details cGAN, cVAE, conditional diffusion models, paired and unpaired translation, image-to-image, text-to-image, text-to-text, image-to-text
    Phillip Isola slides

    optional reading:
    Conditional generative models
    Week 10
    Tue 11/5 Generalization (OOD)
    + details Exploring model generalization out of distribution, with a focus on adversarial robustness and distribution shift
    Sara Beery slides

    required reading:
    Adversarial examples
    Training robust classifiers

    optional readings:
    WILDS: A Benchmark of in-the-Wild Distribution Shifts
    Shortcuts in NN
    From ImageNet to Image Classification
    Noise or Signal
    Extrapolation
    pset 4 due
    pset 5 out
    Thu 11/7 Transfer learning: Models
    + details Finetuning, linear probes, knowledge distillation, foundation models
    Sara Beery slides

    required reading:
    Transfer learning and adaptation

    optional readings:
    A Brief Review of Domain Adaptation
    Align and Distill: Unifying and Improving Domain Adaptive Object Detection
    Week 11
    Tue 11/12 Transfer learning: Data
    + details Generative models as data++, domain adaptation, prompting
    Sara Beery slides

    required reading: (same as previous lecture)
    pset 5 due
    Thu 11/14 Scaling laws
    + details Scaling laws for different neural architectures, power laws, breaking power laws, theoretical underpinnings, critical batch size
    Phillip Isola Final project proposal due
    Week 12
    Tue 11/19 Guest Lecture: Large Language Models
    + details
    Jacob Andreas
    Thu 11/21 Guest Lecture: Deep Learning for Music
    + details
    Anna Huang
    Week 13
    Tue 11/26 TBD
    + details Advanced tools for thinking about gradient descent on arbitrary computational graphs. Includes "metrisation" and "non-dimensionalisation" of neural architecture.
    Jeremy Bernstein
    Thu 11/28 No class: Thanksgiving
    Week 14
    Tue 12/3 TBD
    + details Overview of key topics from the course; data and model scaling; LLMs and reasoning agents; automation and regulation
    Jeremy Bernstein
    Thu 12/5 Project office hours
    Week 15
    Tue 12/10 Efficient Policy Optimization Techniques for LLMs
    + details Post-training is essential for enhancing large language model (LLMs) capabilities and aligning them to human preferences. One of the most widely used post-training techniques is reinforcement learning from human feedback (RLHF). In this talk, I will first discuss the challenges of applying RL to LLM training. Next, I will introduce RL algorithms that tackle these challenges by utilizing key properties of the underlying problem. Additionally, I will present an approach that simplifies the RL policy optimization process for LLMs to relative reward regression. Finally, I will extend this idea to develop a policy optimization technique for multi-turn reinforcement learning from human feedback (RLHF).
    Kianté Brantly Final project due


    Collaboration policy



    AI assistants policy



    Attendance policy

  • Attendance is at your discretion. Recordings will be released here right after each class.


  • Late policy