6.S898 Deep Learning, Fall 2023

MIT EECS 6.S898 Deep Learning
Fall 2023
[ Schedule \| Policies \| Piazza \| Canvas \| Gradescope \| Previous years ]

Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, graph nets, transformers), geometry and invariances in deep learning, backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.

Pre-requisites: (6.3900 [6.036] or 6.C01 or 6.3720 [6.401]) and (6.3700 [6.041] or 6.3800 [6.008] or 18.05) and (18.C06 or 18.06)

Note: This course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. For non-students who want access to Piazza or Canvas, email Anthea Li (yichenl@mit.edu) to be added manually. For non-MIT students, refer to cross-registration.

Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Mon. 2-3PM, Rm. 34-302

Instructor Sara Beery

beery at mit dot edu

OH: Tues. 9-10AM. Rm. 36-153

Instructor Jeremy Bernstein

jbernstein at mit dot edu

OH: Tues. 4-5PM, Rm. 34-302

- Logistics

Class meetings: Tuesday, Thursday 1:00 - 2:30 PM in room 2-190.
We will be using both Piazza and Canvas for announcements.
Refer to: Piazza (all questions), Canvas (announcements and homework release), Gradescope (HW submission and grades).
For personal logistical questions, contact the instructors+TAs via a private Piazza post or via e-mail (we prefer Piazza).

- Grading Policy

65% problem sets

35% final project

The final project will be a research project on a deep learning topic of your choice.
You will run experiments and do analysis to explore your research question. You will then write up your research in the format of a blog post, which will include an explanation of the background material, the new investigations, and the results you found.
You are encouraged to include plots, animations, and interactive graphics to make your findings clear. Some examples of nice research blog posts are here: [1] [2] [3] [4].
The final project will be graded for clarity and insight as well as novelty and depth of the experiments and analysis. Detailed guidance will be given later in the semester.

Collaboration policy

AI assistants policy (ChatGPT, etc)

Late policy

Class Schedule

** class schedule is subject to change **

Date	Topics	Speaker	Course Materials	Assignments
Week 1
Thu 9/7	Course overview, introduction to deep neural networks and their basic building blocks	Sara Beery	slides notation for this course notes optional reading: Neural nets as distribution transformers
Week 2
Tue 9/12	How to train a neural net + details SGD, Backprop and autodiff, differentiable programming	Sara Beery	slides required reading: gradient-based learning required reading: backprop	pset 1 out
Tue 9/12 (5-6PM ET)	PyTorch Tutorial	Saachi Jain	Tutorial link	32-D463 Star Room in Stata
Wed 9/13 (10-11AM ET)	PyTorch Tutorial	Anthea Li	Tutorial link	32-D463 Star Room in Stata
Thu 9/14	Approximation theory + details How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?	Phillip Isola	slides optional reading: Deep learning theory notes sections 2 and 5 (this is written at a rather advanced level; try to get the intuitions rather than all the details)
Week 3
Tue 9/19	Architectures: Grids + details This lecture will focus mostly on convolutional neural networks, presenting them as a good choice when your data lies on a grid.	Phillip Isola	slides required reading: CNNs
Thu 9/22	Architectures: Graphs + details This lecture covers graph neural networks (GNNs), showing connections to MLPs and CNNs and message passing algorithms. We will also discuss theoretical limitations on the expressive power of GNNs, and the practical implications of this."	Phillip Isola	slides required reading: Section 5 of GRL book optional reading: How Powerful are Graph Neural Networks optional reading: Distill blog on GNNS
Week 4
Tue 9/26	Scaling rules for optimisation + details Spectral perspective on neural computation. Feature learning and hyperparameter transfer. Scaling rules for hyperparameter transfer across width and depth.	Jeremy Bernstein	slides	pset 1 due pset 2 out
Thu 9/28	Bayesian analysis of learning and generalisation + details Over-parameterisation. Inadequacy of VC dimension. Bayesian perspective. PAC-Bayes theory.	Jeremy Bernstein	slides optional reading: Understanding deep learning requires rethinking generalization optional reading: Probable networks and plausible predictions
Week 5
Tue 10/3	Guest Lecture: Tess Smidt + details Symmetry can occur in many forms. For physical systems in 3D, we have the freedom to choose any coordinate system and therefore any physical property must transform predictably under elements of Euclidean symmetry (3D rotations, translations and inversion). For algorithms involving the nodes and edges of graphs, we have symmetry under permutation of how the nodes and edges are ordered in computer memory. Unless coded otherwise, machine learned models make no assumptions about the symmetry of a problem and will be sensitive to e.g. an arbitrary choice of coordinate system or ordering of nodes and edges in an array. One of the primary motivations of explicitly treating symmetry in machine learning models is to eliminate the need for data augmentation. Another motivation is that by encoding symmetry into a method, we get the guarantee that the model will give the "same" answer for an example and a "symmetrically equivalent" example even if the model was not explicitly trained on the "symmetrically equivalent" example. In this lecture, we will discuss several ways to make machine learning models "symmetry-aware" (e.g. input representation vs. loss vs. and model architecture). We will focus on how to handle 3D Euclidean symmetry and permutation symmetry in neural networks, describe unintuitive and beneficial consequences of these symmetries, and discuss how to set up training tasks that are compatible with your assumptions of symmetry.	Tess Smidt	slides
Thu 10/5	Hacker's guide to DL + details	Phillip Isola	slides optional reading: Recipe's for training NNs optional reading: Rules of ML
Week 6
Tues 10/10	Student holiday
Wed 10/11				pset 2 due pset 3 out
Thu 10/12	Architectures -- transformers + details Transformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPS, GNNs, and CNNs -- they are all variations on the same themes!	Sara Beery	slides required reading: Transformers (note that this reading focuses on examples from vision but you can apply the same architecture to any kind of data)
Week 7
Tue 10/17	Architectures -- memory + details RNNs, LSTMs, memory, sequence models.	Sara Beery	slides required reading: RNNs optional reading: RNN Stability analysis and LSTMs
Thu 10/19	Representation learning -- reconstruction-based + details Intro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses.	Phillip Isola	slides required reading: Representation learning optional reading: Representation learning: A review
Fri 10/20				Final project proposal guidelines out
Week 8
Tue 10/24	Representation learning -- similarity-based + details In this lecture, we will talk about unsupervised and weakly supervised learning, primarily through the lens of similarity driven learning. I’ll briefly talk about metric learning first, before moving onto self-supervised learning with a focus on contrastive learning (the modern cousin of metric learning).	Sara Beery	slides required reading: (same as previous lecture) optional reading: Contrastive feature alignment optional reading: Contrastive learning	pset 3 due pset 4 out
Thu 10/26	Representation learning -- theory + details	Jeremy Bernstein	slides optional reading: Kernel methods for DL optional reading: DNN as Gaussian Processes
Week 9
Tue 10/31	Generative models -- basics + details Density and energy models, samplers, GANs, autoregressive models, diffusion models	Phillip Isola	slides required reading: Generative Models
Thu 11/2	Generative models -- representation learning meets generative modeling + details VAEs, latent variables	Phillip Isola	slides required reading: Generative modeling meets representation learning optional reading: VAE paper
Week 10
Tue 11/7	Generative models --- conditional models + details cGAN, cVAE, paired and unpaired translation, image-to-image, text-to-image, world models	Phillip Isola	slides required reading: Conditional generative models	pset 4 due pset 5 out
Thu 11/9	Generalization (OOD) + details	Sara Beery	slides required reading: Adversarial examples required reading: Training robust classifiers required reading: WILDS: A Benchmark of in-the-Wild Distribution Shifts optional reading: Shortcuts in NN optional reading: From ImageNet to Image Classification optional reading: Noise or Signal optional reading: Extrapolation
Fri 11/10				project proposal due @ 11:59PM EST
Week 11
Tue 11/14	Transfer learning -- models + details Finetuning, linear probes, knowledge distillation, foundation models	Sara Beery	slides required reading:Transfer learning and adaptation	pset 5 due
Thu 11/16	Transfer learning -- data + details Generative models as data++, domain adaptation, prompting	Sara Beery	slides required reading: (same as previous lecture)
Week 12
Tue 11/21	Guest lecture: Large Language Models + details	Yoon Kim
Tue 11/23	No class: Thanksgiving
Week 13
Tue 11/28	Scaling laws + details Scaling laws for different neural architectures, power laws, breaking power laws, theoretical underpinnings, critical batch size	Phillip Isola	slides required reading:Scaling Laws for Neural Language Models optional reading:Chinchilla scaling laws optional reading:Data manifold argument optional reading:Breaking power laws via data pruning optional reading:Critical batch size
Thu 11/30	Automatic gradient descent + details Advanced tools for thinking about gradient descent on arbitrary computational graphs. Includes "metrisation" and "non-dimensionalisation" of neural architecture.	Jeremy Bernstein	slides
Week 14
Tue 12/5	Project office hours
Thu 12/7	Deploying computer vision systems - A case study on birdsong identification + details TBA	Grant Van Horn
Week 15
Tue 12/12	Past & future of deep learning + details Overview of key topics from the course; data and model scaling; LLMs and reasoning agents; automation and regulation	Jeremy Bernstein		Final project due

Collaboration policy

Psets should be written up individually and should reflect your own individual work. However, you may discuss with your peers, TAs, and instructors.
You should not copy or share complete solutions.
If you work with anyone on the pset (other than TAs and instructors), list their names at the top of the pset.

AI assistants policy

Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.
This is a deep learning class and you should try out all the latest AI assistants (they are pretty much all using deep learning). It's very important to play with them to learn what they can do and what they can't do. That's a part of the content of this course.
Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about pset questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.
But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.
If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.
If you work with any AI on a pset, briefly describe which AI and how you used it at the top of the pset (a few sentences is enough).

Late policy

Homeworks will not be accepted more than 7 days after the deadline.
The grade on a homework received n days after the deadline (n<=7) will be multiplied by (1-n/14). We will round up to units of full days; submitting 1 hour late counts as using 1 late day.
Ten penalty days will be automatically waived for each student.

GradSupport

Previous years

Fall 2022

Fall 2021

MIT EECS

Fall 2023

Course Overview

Course Information

Instructor Phillip Isola

Instructor Sara Beery

Instructor Jeremy Bernstein

TA Anthea Li

TA Thien Le

TA Saachi Jain

TA Veevee Cai

TA Pratyusha Sharma

TA Jocelyn Shen

- Logistics

- Grading Policy

Class Schedule

Collaboration policy

AI assistants policy

Late policy

Previous years