MIT EECS6.7960 Deep Learning |
||
Fall 2024 |
||
Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, graph nets, transformers), geometry and invariances in deep learning, backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics.
Pre-requisites: 18.05 and (6.3720, 6.3900, or 6.C01)
Note: This course is appropriate for advanced undergraduates and graduate students, and is 3-0-9 units. Due to heavy enrollment, we will very unfortunately not be able to take cross-registrations this semester.
jmeindl at mit dot edu
OH: Weds 1-2pm 45-344
tvbraun at mit dot edu
** class schedule is subject to change **
Date | Topics | Speaker | Course Materials | Assignments | |
Week 1 | |||||
Thu 9/5 | Course overview, introduction to deep neural networks and their basic building blocks | Sara Beery |
slides required readings: notation for this course intro to neural networks optional readings: Neural nets as distribution transformers |
||
Week 2 | |||||
Tue 9/10 | How to train a neural net+ detailsSGD, Backprop and autodiff, differentiable programming |
Sara Beery |
slides required readings: gradient-based learning backprop |
pset 1 out | |
Tue 9/10 (5:30-6:30PM ET) | PyTorch Tutorial | Jamie Meindl | Tutorial link | 45-230 | |
Wed 9/11 (11am-noon ET) | PyTorch Tutorial | Sharut Gupta | Tutorial link | 54-100 | |
Thu 9/12 | Approximation theory+ detailsHow well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity? |
Jeremy Bernstein |
slides optional reading: Deep learning theory notes sections 2 and 5 (this is written at a rather advanced level; try to get the intuitions rather than all the details) |
||
Week 3 | |||||
Tue 9/17 | Architectures: Grids+ detailsThis lecture will focus mostly on convolutional neural networks, presenting them as a good choice when your data lies on a grid. |
Sara Beery |
slides
required reading: CNNs |
||
Thu 9/19 | Architectures: Graphs+ detailsThis lecture covers graph neural networks (GNNs), showing connections to MLPs and CNNs and message passing algorithms. We will also discuss theoretical limitations on the expressive power of GNNs, and the practical implications of this." |
Phillip Isola |
slides
required reading: Section 5 of GRL book (mainly focus on the content through 5.1) optional readings: How Powerful are Graph Neural Networks Distill blog on GNNS |
||
Week 4 | |||||
Tue 9/24 | Generalization Theory+ detailsBasic generalization theory. Overparameterization. Double descent. Inadequacy of VC dimension. Inductive biases in deep learning. |
Phillip Isola |
slides optional readings: Understanding deep learning requires rethinking generalization Double descent Probable networks and plausible predictions |
pset 1 due pset 2 out |
|
Thu 9/26 | Scaling rules for optimisation+ detailsSpectral perspective on neural computation. Feature learning and hyperparameter transfer. Scaling rules for hyperparameter transfer across width and depth. |
Jeremy Bernstein |
slides
required reading: Steepest descent |
||
Week 5 | |||||
Tue 10/1 | Architectures: Transformers
+ detailsTransformers. Three key ideas: tokens, attention, positional codes. Relationship between transformers and MLPs, GNNs, and CNNs -- they are all variations on the same themes! |
Phillip Isola |
slides required reading: Transformers (note that this reading focuses on examples from vision but you can apply the same architecture to any kind of data) |
||
Thu 10/3 | Hacker's guide to DL + detailsPractical tips mixed with opinionated anecdotes about how to get deep nets to actually do what you want. |
Phillip Isola |
slides optional readings: Recipe's for training NNs Rules of ML |
||
Week 6 | |||||
Tue 10/8 | Architectures: Memory
+ detailsRNNs, LSTMs, memory, sequence models. |
Sara Beery |
slides
required reading: RNNs optional reading: RNN Stability analysis and LSTMs |
pset 2 due pset 3 out |
|
Thu 10/10 | Representation learning: Reconstruction-based
+ detailsIntro to representation learning, representations in nets and in the brain, autoencoders, clustering and VQ, self-supevised learning with reconstruction losses. |
Phillip Isola |
slides
required reading: Representation learning optional reading: Representation learning: A review |
||
Week 7 | |||||
Tue 10/15 | Student holiday | ||||
Thu 10/17 | Representation learning -- similarity-based
+ detailsMetric learning, contrastive learning, self-supervised and supervised variants, InfoNCE, alignment and uniformity, hard negatives. |
Sara Beery |
slides
required reading: (same as previous lecture) optional readings: Alignment and uniformity Contrastive learning blog, covering lots of recent methods |
||
Week 8 | |||||
Tue 10/22 | Representation learning -- theory
+ detailsA look at the inductive biases of architecture. Gaussian processes and the Neural Network--Gaussian Process correspondence |
Jeremy Bernstein |
slides
optional readings: Kernel methods for DL DNN as Gaussian Processes |
pset 3 due pset 4 out Final project proposal guidelines out |
|
Thu 10/24 |
Generative models: Basics
+ detailsDensity and energy models, samplers, GANs, autoregressive models, diffusion models |
Phillip Isola |
slides
required reading: Generative Models |
||
Week 9 | |||||
Tue 10/29 | Generative models: Representation learning meets generative modeling
+ detailsVAEs, latent variables |
Phillip Isola |
slides required reading: Generative modeling meets representation learning optional reading: VAE paper |
||
Thu 10/31 | Generative models: Conditional models
+ detailscGAN, cVAE, conditional diffusion models, paired and unpaired translation, image-to-image, text-to-image, text-to-text, image-to-text |
Phillip Isola |
slides
optional reading: Conditional generative models |
||
Week 10 | |||||
Tue 11/5 | Generalization (OOD)
+ detailsExploring model generalization out of distribution, with a focus on adversarial robustness and distribution shift |
Sara Beery |
slides
required reading: Adversarial examples Training robust classifiers optional readings: WILDS: A Benchmark of in-the-Wild Distribution Shifts Shortcuts in NN From ImageNet to Image Classification Noise or Signal Extrapolation |
pset 4 due pset 5 out |
|
Thu 11/7 | Transfer learning: Models
+ detailsFinetuning, linear probes, knowledge distillation, foundation models |
Sara Beery |
slides
required reading: Transfer learning and adaptation optional readings: A Brief Review of Domain Adaptation Align and Distill: Unifying and Improving Domain Adaptive Object Detection |
||
Week 11 | |||||
Tue 11/12 | Transfer learning: Data
+ detailsGenerative models as data++, domain adaptation, prompting |
Sara Beery |
slides
required reading: (same as previous lecture) |
pset 5 due | |
Thu 11/14 | Scaling laws
+ detailsScaling laws for different neural architectures, power laws, breaking power laws, theoretical underpinnings, critical batch size |
Phillip Isola | Final project proposal due | ||
Week 12 | |||||
Tue 11/19 | Guest Lecture: Large Language Models
+ details |
Jacob Andreas | |||
Thu 11/21 | Guest Lecture: Deep Learning for Music
+ details |
Anna Huang | |||
Week 13 | |||||
Tue 11/26 | TBD
+ detailsAdvanced tools for thinking about gradient descent on arbitrary computational graphs. Includes "metrisation" and "non-dimensionalisation" of neural architecture. |
Jeremy Bernstein | |||
Thu 11/28 | No class: Thanksgiving | ||||
Week 14 | |||||
Tue 12/3 | TBD+ detailsOverview of key topics from the course; data and model scaling; LLMs and reasoning agents; automation and regulation |
Jeremy Bernstein | |||
Thu 12/5 | Project office hours | ||||
Week 15 | |||||
Tue 12/10 | Efficient Policy Optimization Techniques for LLMs
+ detailsPost-training is essential for enhancing large language model (LLMs) capabilities and aligning them to human preferences. One of the most widely used post-training techniques is reinforcement learning from human feedback (RLHF). In this talk, I will first discuss the challenges of applying RL to LLM training. Next, I will introduce RL algorithms that tackle these challenges by utilizing key properties of the underlying problem. Additionally, I will present an approach that simplifies the RL policy optimization process for LLMs to relative reward regression. Finally, I will extend this idea to develop a policy optimization technique for multi-turn reinforcement learning from human feedback (RLHF). |
Kianté Brantly | Final project due |