6.7960 Deep Learning, Fall 2024

MIT EECS 6.7960 Deep Learning
Fall 2024
[ Schedule \| Policies \| Piazza \| Canvas \| Gradescope \| Lecture Recordings \| Previous years ]

Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, graph nets, transformers), geometry and invariances in deep learning, backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.

Pre-requisites: 18.05 and (6.3720, 6.3900, or 6.C01)

Note: This course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. Due to heavy enrollment, we will very unfortunately not be able to take cross-registrations this semester.

Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Mon 2-3pm 45-344

Instructor Sara Beery

beery at mit dot edu

OH: Mon 9-10am 45-344

Instructor Jeremy Bernstein

jbernstein at mit dot edu

OH: Weds 9-10am 45-344

TA Behrooz Tahmasebi

bzt at mit dot edu

OH: Fri 9-10am 45-344

TA Robert Calef

rcalef at mit dot edu

OH: Fri 3-4pm 45-344

TA Jamie Meindl

jmeindl at mit dot edu

OH: Weds 1-2pm 45-344

Course Assistant Taylor Braun

tvbraun at mit dot edu

- Logistics

Class meetings: Tuesday, Thursday 1:00 - 2:30 PM in room 45-230.
*New* Office hours calendar: link
We will be using both Piazza and Canvas for announcements.
Refer to: Piazza (all questions), Canvas (announcements), Gradescope (homework release, submission, and grades).
All extension requests must go through S3 or GradSupport. For any personal or logistical questions, such as regarding absenses, accomodations, etc, please email the course email, 6.7960-instructors@mit.edu, not the instructors directly.

- Grading Policy

65% problem sets

35% final project

The final project will be a research project on a deep learning topic of your choice.
You will run experiments and do analysis to explore your research question. You will then write up your research in the format of a blog post, which will include an explanation of the background material, the new investigations, and the results you found.
You are encouraged to include plots, animations, and interactive graphics to make your findings clear. Some examples of nice research blog posts are here: [1] [2] [3] [4].
The final project will be graded for clarity and insight as well as novelty and depth of the experiments and analysis. Detailed guidance will be given later in the semester.

Collaboration policy

AI assistants policy (ChatGPT, etc)

Attendance policy

Late policy

- Materials

Readings will come from a variety of sources and will be posted on the schedule each week.
Some readings are derived from Foundations of Computer Vision (which one of your instructors is an author of). You don't need to buy the book (but you can if you want). A free online version of the whole book will be available soon.
The best textbook devoted entirely to deep learning is probably Understanding Deep Learning, which is freely available online.

Class Schedule

** class schedule is subject to change **

Date	Topics	Speaker	Course Materials	Assignments
Week 1
Thu 9/5	Course overview, introduction to deep neural networks and their basic building blocks	Sara Beery	slides required readings: notation for this course intro to neural networks optional readings: Neural nets as distribution transformers
Week 2
Tue 9/10	How to train a neural net + details SGD, Backprop and autodiff, differentiable programming	Sara Beery	slides required readings: gradient-based learning backprop	pset 1 out
Tue 9/10 (5:30-6:30PM ET)	PyTorch Tutorial	Jamie Meindl	Tutorial link	45-230
Wed 9/11 (11am-noon ET)	PyTorch Tutorial	Sharut Gupta	Tutorial link	54-100
Thu 9/12	Approximation theory + details How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?	Jeremy Bernstein	slides optional reading: Deep learning theory notes sections 2 and 5 (this is written at a rather advanced level; try to get the intuitions rather than all the details)
Week 3
Tue 9/17	Architectures: Grids + details This lecture will focus mostly on convolutional neural networks, presenting them as a good choice when your data lies on a grid.	Sara Beery	slides required reading: CNNs
Thu 9/19	Architectures: Graphs + details This lecture covers graph neural networks (GNNs), showing connections to MLPs and CNNs and message passing algorithms. We will also discuss theoretical limitations on the expressive power of GNNs, and the practical implications of this."	Phillip Isola	slides required reading: Section 5 of GRL book (mainly focus on the content through 5.1) optional readings: How Powerful are Graph Neural Networks Distill blog on GNNS
Week 4
Tue 9/24	Generalization Theory + details Basic generalization theory. Overparameterization. Double descent. Inadequacy of VC dimension. Inductive biases in deep learning.	Phillip Isola	slides optional readings: Understanding deep learning requires rethinking generalization Double descent Probable networks and plausible predictions	pset 1 due pset 2 out
Thu 9/26	Scaling rules for optimisation + details Spectral perspective on neural computation. Feature learning and hyperparameter transfer. Scaling rules for hyperparameter transfer across width and depth.	Jeremy Bernstein	slides required reading: Steepest descent
Week 5
Tue 10/1	Architectures: Transformers + details Transformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPs, GNNs, and CNNs -- they are all variations on the same themes!	Phillip Isola	slides required reading: Transformers (note that this reading focuses on examples from vision but you can apply the same architecture to any kind of data)
Thu 10/3	Hacker's guide to DL + details Practical tips mixed with opinionated anecdotes about how to get deep nets to actually do what you want.	Phillip Isola	slides optional readings: Recipe's for training NNs Rules of ML
Week 6
Tue 10/8	Architectures: Memory + details RNNs, LSTMs, memory, sequence models.	Sara Beery	slides required reading: RNNs optional reading: RNN Stability analysis and LSTMs	pset 2 due pset 3 out
Thu 10/10	Representation learning: Reconstruction-based + details Intro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses.	Phillip Isola	slides required reading: Representation learning optional reading: Representation learning: A review
Week 7
Tue 10/15	Student holiday
Thu 10/17	Representation learning -- similarity-based + details Metric learning, contrastive learning, self-supervised and supervised variants, InfoNCE, alignment and uniformity, hard negatives.	Sara Beery	slides required reading: (same as previous lecture) optional readings: Alignment and uniformity Contrastive learning blog, covering lots of recent methods
Week 8
Tue 10/22	Representation learning -- theory + details A look at the inductive biases of architecture. Gaussian processes and the Neural Network--Gaussian Process correspondence	Jeremy Bernstein	slides optional readings: Kernel methods for DL DNN as Gaussian Processes	pset 3 due pset 4 out Final project proposal guidelines out
Thu 10/24	Generative models: Basics + details Density and energy models, samplers, GANs, autoregressive models, diffusion models	Phillip Isola	slides required reading: Generative Models
Week 9
Tue 10/29	Generative models: Representation learning meets generative modeling + details VAEs, latent variables	Phillip Isola	slides required reading: Generative modeling meets representation learning optional reading: VAE paper
Thu 10/31	Generative models: Conditional models + details cGAN, cVAE, conditional diffusion models, paired and unpaired translation, image-to-image, text-to-image, text-to-text, image-to-text	Phillip Isola	slides optional reading: Conditional generative models
Week 10
Tue 11/5	Generalization (OOD) + details Exploring model generalization out of distribution, with a focus on adversarial robustness and distribution shift	Sara Beery	slides required reading: Adversarial examples Training robust classifiers optional readings: WILDS: A Benchmark of in-the-Wild Distribution Shifts Shortcuts in NN From ImageNet to Image Classification Noise or Signal Extrapolation	pset 4 due pset 5 out
Thu 11/7	Transfer learning: Models + details Finetuning, linear probes, knowledge distillation, foundation models	Sara Beery	slides required reading: Transfer learning and adaptation optional readings: A Brief Review of Domain Adaptation Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Week 11
Tue 11/12	Transfer learning: Data + details Generative models as data++, domain adaptation, prompting	Sara Beery	slides required reading: (same as previous lecture)	pset 5 due
Thu 11/14	Scaling laws + details Scaling laws for different neural architectures, power laws, breaking power laws, theoretical underpinnings, critical batch size	Phillip Isola	slides required reading: Scaling Laws for Neural Language Models optional reading: Chinchilla scaling laws Data manifold argument Breaking power laws via data pruning Critical batch size	Final project proposal due
Week 12
Tue 11/19	Guest Lecture: Large Language Models + details LLM basics, Prompting, In-Context Learning, Chain-of-Thought, Instruction tuning, RLHF	Jacob Andreas	slides optional reading: "Let's think step by step"
Thu 11/21	Guest Lecture: AI for Musical Creativity + details This talk explores how we might design algorithms and interactions for human-ai co-creation in music making, with a focus on generative modeling and reinforcement learning.	Anna Huang
Week 13
Tue 11/26	Metrized deep learning + details Jeremy's research. Building neural networks like lego, optimization, the theory of modules, scaling, duality.	Jeremy Bernstein	slides optional readings: Modular Duality The duality structure gradient descent algorithm Scalable Optimization in the Modular Norm
Thu 11/28	No class: Thanksgiving
Week 14
Tue 12/3	Inference methods for deep learning + details Everything beyond a simple forward pass: beam search, chain-of-thought, in-context learning, test-time training. Also methods that use search to improve learning.	Phillip Isola	slides optional readings: Test-Time Training STaR
Thu 12/5	Project office hours
Week 15
Tue 12/10	Efficient Policy Optimization Techniques for LLMs + details Post-training is essential for enhancing large language model (LLMs) capabilities and aligning them to human preferences. One of the most widely used post-training techniques is reinforcement learning from human feedback (RLHF). In this talk, I will first discuss the challenges of applying RL to LLM training. Next, I will introduce RL algorithms that tackle these challenges by utilizing key properties of the underlying problem. Additionally, I will present an approach that simplifies the RL policy optimization process for LLMs to relative reward regression. Finally, I will extend this idea to develop a policy optimization technique for multi-turn reinforcement learning from human feedback (RLHF).	Kianté Brantly		Final project due

Collaboration policy

Psets should be written up individually and should reflect your own individual work. However, you may discuss with your peers, TAs, and instructors.
You should not copy or share complete solutions or ask others if your answer is correct (in person or via piazza/canvas).
If you work with anyone on the pset (other than TAs and instructors), list their names at the top of the pset.

AI assistants policy

Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.
This is a deep learning class and you should try out all the latest AI assistants (they are pretty much all using deep learning). It's very important to play with them to learn what they can do and what they can't do. That's a part of the content of this course.
Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about pset questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.
But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.
If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.
If you work with any AI on a pset, briefly describe which AI and how you used it at the top of the pset (a few sentences is enough).

Attendance policy

Attendance is at your discretion. Recordings will be released here right after each class.

Late policy

Homeworks will not be accepted more than 7 days after the deadline.
The grade on a homework received n days after the deadline (n<=7) will be multiplied by (1-n/14). We will round up to units of full days; submitting 1 hour late counts as using 1 late day.
Ten penalty days will be automatically waived for each student.

GradSupport

We will not be able to support course incompletes.

Previous years

Fall 2023

Fall 2022

Fall 2021

MIT EECS

Fall 2024

Course Overview

Course Information

Instructor Phillip Isola

Instructor Sara Beery

Instructor Jeremy Bernstein

TA Behrooz Tahmasebi

TA Robert Calef

TA Jamie Meindl

TA Sharut Gupta

TA David Forman

TA Sebastian Alberdi

TA Ishan Ganguly

TA Jeff Lai

TA Isabella Yu

TA Elizaveta Tremsina

Course Assistant Taylor Braun

- Logistics

- Grading Policy

- Materials

Class Schedule

Collaboration policy

AI assistants policy

Attendance policy

Late policy

Previous years