6.S898 Deep Learning, Fall 2023

MIT EECS 6.S898 Deep Learning
Fall 2022
[ Schedule \| Policies \| Piazza \| Canvas \| Previous years ]

Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, transformers), backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.

Pre-requisites: (6.3900 [6.036] or 6.C01 or 6.3720 [6.401]) and (6.3700[6.041] or 6.3800 [6.008] or 18.05) and (18.C06 or 18.06)

Note: This is course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. For non-students who want access to Piazza or Canvas, email Aidan Curtis (curtisa@mit.edu) to be added manually. For non-MIT students, refer to cross-registeration.

Lectures will be in-person only; if there is an important reason you cannot make class, you may email Aidan Curtis (curtisa@mit.edu) to get a recording.

Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Thu 2:30pm-3:30pm (2-146).

Instructor Stefanie Jegelka

stefje at csail dot mit dot edu

OH: Thu 2:30pm-3:30pm (2-146).

TA Tongzhou Wang

tongzhou at mit dot edu

OH: Mon 1:00pm-2:00pm (24-317).

TA Aidan Curtis

curtisa at mit dot edu

OH: Mon 1:00pm-2:00pm (24-317).

- Logistics

Class meetings: Tuesday, Thursday 1:00 - 2:30 PM in room 4-231.
We will be using both Piazza and Canvas for announcements.
Contact instructors/TA via private Piazza post or e-mail.

- Grading Policy

5% participation

Class attendance and participation during lectures and on Piazza.

65% problem sets

30% final project

Collaboration policy

Late policy

Class Schedule

** class schedule is subject to change **

Date	Topics	Speaker	Course Materials	Assignments
Week 1
Thu 9/8	Course overview, introduction to deep neural networks and their basic building blocks	Phillip Isola & Stefanie Jegelka	slides notes
Week 2
Tue 9/13	How to train a neural net + details SGD, Backprop and autodiff, differentiable programming	Phillip Isola	slides notes	pset 1 out
Thu 9/15	Approximation theory + details How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?	Stefanie Jegelka	slides notes
Week 3
Tue 9/20	Generalization theory (IID) + details We will start by briefly discussing the classical approach to generalization bounds, large margin theory, and complexity of neural networks. We then discuss recent interpolation results, the double or multiple-descent phenomenon, and the linear regime in overparametrized neural networks.	Stefanie Jegelka	slides double descent + optional notes deep double descent rethinking benign overfit simple case (§2.3)
Thu 9/22	Architectures -- Grids + details CNNs	Phillip Isola	slides notes
Week 4
Tue 9/27	Architectures -- Graphs + details GNNs	Stefanie Jegelka	slides notes: GNNs (§5, optional §7.3) representation power of GNNs + optional notes GNN intro part 1 part 2 GCN GAT Neural message passing for quantum chemistry (§2) GNN representation theory	pset 1 due pset 2 out
Thu 9/29	Geometric deep learning + details Inductive biases of archs, invariances and equivariances	Stefanie Jegelka	slides Geometric DL (§3, rest optional) + optional notes Geometric DL blog Deep sets Equivariance in DL
Week 5
Tue 10/4	Hacker's guide to DL + details In this lecture, we'll discuss the practical side of developing deep learning systems. We will focus on best practices, common mistakes to look for, and evaluation methods for developing deep learning models. While optimization methods and software design practices for Deep Learning are still under development, this lecture will try to present several tried and true implementation and debugging strategies for diagnosing failures in model training and help make model training less painful in the future.	Phillip Isola	slides notes
Thu 10/6	Architectures -- transformers + details Transformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPS, GNNs, and CNNs -- they are all variations on the same themes!	Phillip Isola	slides notes
Week 6
Wed 10/12				pset 3 out
Thu 10/13	Architectures -- memory + details RNNs, LSTMs, memory, sequence models.	Phillip Isola	slides notes
Week 7
Tue 10/18	Representation learning -- reconstruction-based + details Intro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses.	Phillip Isola	slides notes notes (optional)
Thu 10/20	Representation learning -- similarity-based + details In this lecture, we will talk about unsupervised and weakly supervised learning, primarily through the lens of similarity driven learning. I’ll briefly talk about metric learning first, before moving onto self-supervised learning with a focus on contrastive learning (the modern cousin of metric learning).	Stefanie Jegelka	notes contrastive feature geometry (align+uniform) contrastive learning	pset 2 due pset 3 due pset 4 out
Week 8
Tue 10/25	Representation learning -- theory + details	Stefanie Jegelka	slides inductive bias (negative results; optional) simplicity bias (low-rank; optional) pitfalls of simplicity bias (optional) + more optional notes simplicity bias (SGD; function complexity) factors affecting features shortcuts in contrastive learning
Thu 10/27	DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking [Guest Lecture]	Gabriele Corso	slides paper (optional)
Week 9
Tue 11/1	Generative models -- basics + details Density and energy models, samplers, GANs, autoregressive models, diffusion models	Phillip Isola	slides notes denoising diffusion (optional) diffusion blog (optional)
Thu 11/3	Generative models -- representation learning meets generative modeling + details VAEs, latent variables	Phillip Isola	slides notes	pset 4 due pset 5 out project handout
Week 10
Tue 11/8	Generative models --- conditional models + details cGAN, cVAE, paired and unpaired translation, image-to-image, text-to-image, world models	Phillip Isola	slides notes
Thu 11/10	Generalization (OOD) + details	Stefanie Jegelka	slides notes (adv. examples) notes (robust opt.) notes (shortcuts in NNs; optional) notes (extrapolation; optional)
Fri 11/11				project proposal due
Week 11
Tue 11/15	Transfer learning -- models + details Finetuning, linear probes, knowledge distillation, foundation models	Phillip Isola	slides notes notes (foundation models; optional)
Thu 11/17	Transfer learning -- data + details Generative models as data++, domain adaptation, prompting	Phillip Isola	slides notes (MAML) notes (DatasetGAN)	pset 5 due
Week 12
Tue 11/22	Scaling laws + details	Stefanie Jegelka	slides
Week 13
Tue 11/29	Curiosities about NN optimization and stability + details	Stefanie Jegelka	slides notes (edge of stability; optional) notes (unstable convergence; optional) notes (stability; optional) notes (stability of SGD; optional) notes (convergence to invariant measure; Sec. 1-3; optional) notes (statistical algorithmic stability; Sec. 1-3; optional)
Thu 12/1	Energy-efficient deep learning + details	Vivienne Sze	slides
Week 14
Tue 12/6	Toward Responsibly-Deployable Deep Learning + details	Tom Hartvigsen
Thu 12/8	No lecture OH at the usual lecture location and hour
Week 15
Tue 12/13	Poster session (1pm to 3pm) Grier Room (34-401)			Final project (blog + poster) due

Collaboration policy

Psets should be written up individually and should reflect your own individual work. However, you may discuss with your peers, TAs, and instructors.
You should not copy or share complete solutions.
If you work with anyone on the pset (other than TAs and instructors), list their names at the top of the pset.

Late policy

If your pset is submitted within 7 days (rounding up) of the original deadline, you will receive partial credit.
Your late penalty is reflected as a multiplier on the grade you get for this pset. This multiplier decreases linearly from 1.0 (full credit) for an on time assignment, to 0.5 (half credit) for an assignment 7 days late.
Mathematically, if you earn p points, and submit t hours late (between 0 and 168 hours [= 7 days]), your modified grade will be: (1.0 - 0.5t/168)p. We will round up to the nearest hour.

Additionally, you have 3 slack days for the semester. We will waive up to 3 days worth of late penalties. This only waives existing late penalties - it CANNOT be used to extend a PSet past a week! We will automatically choose which assignments the late days get applied to, to maximize your grade.

Previous years

Fall 2021

MIT EECS

Fall 2022

Course Overview

Course Information

Instructor Phillip Isola

Instructor Stefanie Jegelka

TA Tongzhou Wang

TA Aidan Curtis

- Logistics

- Grading Policy

Class Schedule

Collaboration policy

Late policy

Previous years