MIT EECS6.S898 Deep Learning |
||
Fall 2023 |
||
Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, graph nets, transformers), geometry and invariances in deep learning, backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.
Pre-requisites: (6.3900 [6.036] or 6.C01 or 6.3720 [6.401]) and (6.3700 [6.041] or 6.3800 [6.008] or 18.05) and (18.C06 or 18.06)
Note: This course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. For non-students who want access to Piazza or Canvas, email Anthea Li (yichenl@mit.edu) to be added manually. For non-MIT students, refer to cross-registration.
** class schedule is subject to change **
Date | Topics | Speaker | Course Materials | Assignments | |
Week 1 | |||||
Thu 9/7 | Course overview, introduction to deep neural networks and their basic building blocks | Sara Beery |
slides notation for this course notes optional reading: Neural nets as distribution transformers |
||
Week 2 | |||||
Tue 9/12 | How to train a neural net+ detailsSGD, Backprop and autodiff, differentiable programming |
Sara Beery |
slides required reading: gradient-based learning required reading: backprop |
pset 1 out | |
Tue 9/12 (5-6PM ET) | PyTorch Tutorial | Saachi Jain | Tutorial link | 32-D463 Star Room in Stata | |
Wed 9/13 (10-11AM ET) | PyTorch Tutorial | Anthea Li | Tutorial link | 32-D463 Star Room in Stata | |
Thu 9/14 | Approximation theory+ detailsHow well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity? |
Phillip Isola |
slides optional reading: Deep learning theory notes sections 2 and 5 (this is written at a rather advanced level; try to get the intuitions rather than all the details) |
||
Week 3 | |||||
Tue 9/19 | Architectures: Grids+ detailsThis lecture will focus mostly on convolutional neural networks, presenting them as a good choice when your data lies on a grid. |
Phillip Isola | slides required reading: CNNs |
||
Thu 9/22 | Architectures: Graphs+ detailsThis lecture covers graph neural networks (GNNs), showing connections to MLPs and CNNs and message passing algorithms. We will also discuss theoretical limitations on the expressive power of GNNs, and the practical implications of this." |
Phillip Isola | slides required reading: Section 5 of GRL book optional reading: How Powerful are Graph Neural Networks optional reading: Distill blog on GNNS |
||
Week 4 | |||||
Tue 9/26 | Scaling rules for optimisation+ detailsSpectral perspective on neural computation. Feature learning and hyperparameter transfer. Scaling rules for hyperparameter transfer across width and depth. |
Jeremy Bernstein | slides |
pset 1 due pset 2 out |
|
Thu 9/28 | Bayesian analysis of learning and generalisation + detailsOver-parameterisation. Inadequacy of VC dimension. Bayesian perspective. PAC-Bayes theory. |
Jeremy Bernstein | slides optional reading: Understanding deep learning requires rethinking generalization optional reading: Probable networks and plausible predictions |
||
Week 5 | |||||
Tue 10/3 | Guest Lecture: Tess Smidt + detailsSymmetry can occur in many forms. For physical systems in 3D, we have the freedom to choose any coordinate system and therefore any physical property must transform predictably under elements of Euclidean symmetry (3D rotations, translations and inversion). For algorithms involving the nodes and edges of graphs, we have symmetry under permutation of how the nodes and edges are ordered in computer memory. Unless coded otherwise, machine learned models make no assumptions about the symmetry of a problem and will be sensitive to e.g. an arbitrary choice of coordinate system or ordering of nodes and edges in an array. One of the primary motivations of explicitly treating symmetry in machine learning models is to eliminate the need for data augmentation. Another motivation is that by encoding symmetry into a method, we get the guarantee that the model will give the "same" answer for an example and a "symmetrically equivalent" example even if the model was not explicitly trained on the "symmetrically equivalent" example. In this lecture, we will discuss several ways to make machine learning models "symmetry-aware" (e.g. input representation vs. loss vs. and model architecture). We will focus on how to handle 3D Euclidean symmetry and permutation symmetry in neural networks, describe unintuitive and beneficial consequences of these symmetries, and discuss how to set up training tasks that are compatible with your assumptions of symmetry. |
Tess Smidt | slides | ||
Thu 10/5 | Hacker's guide to DL + details |
Phillip Isola | slides optional reading: Recipe's for training NNs optional reading: Rules of ML |
||
Week 6 | |||||
Tues 10/10 | Student holiday | ||||
Wed 10/11 |
pset 2 due pset 3 out |
||||
Thu 10/12 | Architectures -- transformers
+ detailsTransformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPS, GNNs, and CNNs -- they are all variations on the same themes! |
Sara Beery | slides required reading: Transformers (note that this reading focuses on examples from vision but you can apply the same architecture to any kind of data) |
||
Week 7 | |||||
Tue 10/17 | Architectures -- memory
+ detailsRNNs, LSTMs, memory, sequence models. |
Sara Beery |
slides required reading: RNNs optional reading: RNN Stability analysis and LSTMs |
||
Thu 10/19 | Representation learning -- reconstruction-based
+ detailsIntro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses. |
Phillip Isola | slides required reading: Representation learning optional reading: Representation learning: A review |
||
Fri 10/20 | Final project proposal guidelines out | ||||
Week 8 | |||||
Tue 10/24 | Representation learning -- similarity-based
+ detailsIn this lecture, we will talk about unsupervised and weakly supervised learning, primarily through the lens of similarity driven learning. I’ll briefly talk about metric learning first, before moving onto self-supervised learning with a focus on contrastive learning (the modern cousin of metric learning). |
Sara Beery | slides required reading: (same as previous lecture) optional reading: Contrastive feature alignment optional reading: Contrastive learning |
pset 3 due pset 4 out |
|
Thu 10/26 | Representation learning -- theory
+ details |
Jeremy Bernstein | slides optional reading: Kernel methods for DL optional reading: DNN as Gaussian Processes |
||
Week 9 | |||||
Tue 10/31 |
Generative models -- basics
+ detailsDensity and energy models, samplers, GANs, autoregressive models, diffusion models |
Phillip Isola | slides required reading: Generative Models |
||
Thu 11/2 | Generative models -- representation learning meets generative modeling
+ detailsVAEs, latent variables |
Phillip Isola |
slides required reading: Generative modeling meets representation learning optional reading: VAE paper |
||
Week 10 | |||||
Tue 11/7 | Generative models --- conditional models
+ detailscGAN, cVAE, paired and unpaired translation, image-to-image, text-to-image, world models |
Phillip Isola |
slides required reading: Conditional generative models |
pset 4 due pset 5 out |
|
Thu 11/9 | Generalization (OOD)
+ details |
Sara Beery | slides required reading: Adversarial examples required reading: Training robust classifiers required reading: WILDS: A Benchmark of in-the-Wild Distribution Shifts optional reading: Shortcuts in NN optional reading: From ImageNet to Image Classification optional reading: Noise or Signal optional reading: Extrapolation |
||
Fri 11/10 | project proposal due @ 11:59PM EST | ||||
Week 11 | |||||
Tue 11/14 | Transfer learning -- models
+ detailsFinetuning, linear probes, knowledge distillation, foundation models |
Sara Beery | slides required reading:Transfer learning and adaptation |
pset 5 due | |
Thu 11/16 | Transfer learning -- data
+ detailsGenerative models as data++, domain adaptation, prompting |
Sara Beery | slides required reading: (same as previous lecture) |
||
Week 12 | |||||
Tue 11/21 | Guest lecture: Large Language Models
+ details |
Yoon Kim | |||
Tue 11/23 | No class: Thanksgiving | ||||
Week 13 | |||||
Tue 11/28 | Scaling laws
+ detailsScaling laws for different neural architectures, power laws, breaking power laws, theoretical underpinnings, critical batch size |
Phillip Isola |
slides required reading:Scaling Laws for Neural Language Models optional reading:Chinchilla scaling laws optional reading:Data manifold argument optional reading:Breaking power laws via data pruning optional reading:Critical batch size |
||
Thu 11/30 | Automatic gradient descent
+ detailsAdvanced tools for thinking about gradient descent on arbitrary computational graphs. Includes "metrisation" and "non-dimensionalisation" of neural architecture. |
Jeremy Bernstein | slides | ||
Week 14 | |||||
Tue 12/5 | Project office hours | ||||
Thu 12/7 | Deploying computer vision systems - A case study on birdsong identification + detailsTBA |
Grant Van Horn | |||
Week 15 | |||||
Tue 12/12 | Past & future of deep learning+ detailsOverview of key topics from the course; data and model scaling; LLMs and reasoning agents; automation and regulation |
Jeremy Bernstein | Final project due |