6.S898 Deep Learning

Fall 2021

[ Schedule | Piazza | Canvas ]

[Zoom link for class]

[Final project blogs]

Course Overview

Description: Fundamentals of deep learning, including both theory and applications. Topics include neural net architectures (MLPs, CNNs, RNNs, transformers), backpropagation and automatic differentiation, learning theory and generalization in high-dimensions, and applications to computer vision, natural language processing, and robotics. Each lecture will be from a different invited expert in the field.

Pre-requisites: 6.036, 6.041 or 6.042, 18.06; basically, if you have taken an intro course on ML, at the level of 6.036 or beyond, then you should be in good shape.

Note: This is a graduate-level course of 3-0-3 units. To access the Zoom link, you will need an account with MIT / Harvard / Wellesley email. For non-students who want access to Piazza or Canvas, email the TA to be added manually. For non-MIT students, refer to cross-registeration. The class will both be in person and streamed live over zoom.

Course Information

Instructor Phillip Isola

phillipi at mit dot edu

OH: Mon 3:00pm-4:00pm. (Zoom)

TA Minyoung Huh (Jacob)

minhuh at mit dot edu

OH: Thu 3:00pm-4:00pm. (Zoom)

- Logistics

- Grading Policy

  • 5% participation
  • 60% weekly questions
  • 35% final project

    Class Schedule

    ** class schedule is subject to change **

    Date Topics Speaker Course Materials Assignments
    Week 1: Introduction to neural networks
    Thu 9/9 Course overview, introduction to deep neural networks and their basic building blocks Phillip Isola slides
    Week 2: Differentiable programming and rethinking generalization
    Tue 9/14 SGD, Backprop and autodiff, differentiable programming Phillip Isola slides
    HW 1 released
    Thu 9/16 Generalization in neural networks: classical and modern perspectives
    + abstract We will start by briefly discussing the classical approach to generalization bounds, large margin theory, and complexity of neural networks. We then discuss recent interpolation results, the double or multiple-descent phenomenon, and the linear regime in overparametrized neural networks.
    Alexander Rakhlin slides Assigned readings:
    - Double descent
    - Benign overfitting (proofs optional)

    Optional readings:
    - Deep double descent
    - Rethinking generalization
    Week 3: Generalization and approximation theory
    Tue 9/21 Out-of-distribution generalization, bias, adversaries, and robustness
    + abstract Our current machine learning models achieve impressive performance on many benchmark tasks. Yet, these models can become remarkably brittle and susceptible to manipulation when deployed in the real world. Why is this the case? In this lecture, we take a closer look at this question, and pinpoint some of the roots of this observed brittleness. Specifically, we discuss how the way current ML models “learn” might not necessarily align with our expectations, and then outline possible approaches to alleviate this misalignment.
    Aleksander Madry slides HW 1 due (1pm) / HW 2 released

    Assigned readings:
    - Adversarial examples are not bugs (blog)

    Optional readings:
    - From ImageNet to Image Classification (blog)
    - Noise or signal (blog)
    Thu 9/23 Approximation theory
    + abstract How well can you approximate a given function by a DNN? We will explore various facets of this issue, from universal approximation to Barron's theorem. And does increasing the depth provably help for expressivity?
    Ankur Moitra notes 18.408 Assigned readings:
    - Universal approximation bounds (proofs optional; focus on Sections I and II)

    Optional readings:
    - Benefits of depth
    - Neural Tangent Kernels
    - Power of depth
    - NNs are universal approximators
    Week 4: Deep neural architectures
    Tue 9/28 CNNs, Transformers, Resnets, and Encoder-decoders
    + abstract In this lecture we will cover convolutional neural networks, covering the ideas of filtering, patch-wise processing, multiscale representations, encoder-decoder architectures, residual connections, and self-attention layers. We will use image and video modeling problems as a lens onto when and why these architectures are useful.
    Phillip Isola slides HW 2 due (1pm) / HW 3 released

    Assigned readings:
    - AlexNet

    Optional readings:
    - Residual networks
    - Non-local networks
    - Vision transformers (read `Attention is all you need' first)
    Thu 9/30 RNNs, feedback, memory models, and sequence models
    + abstract In this lecture we will learn about recurrent neural networks and attention, which are fundamental building blocks of natural language processing systems. We will motivate these modules with two standard tasks: language modeling and machine translation.
    Yoon Kim slides Assigned readings:
    - Neural Machine Translation
    - Attention is all you need (blog)

    Optional readings:
    - Seq2Seq
    - Probabilistic Language Model
    - Difficulty of training RNNs
    - Understanding RNNs
    - Effectiveness of RNNs
    Week 5: Hacker’s guide
    Tue 10/5 How to be an effective deep learning researcher / engineer
    + abstract In this lecture, we'll discuss the practical side of developing deep learning systems. We will focus on best practices, common mistakes to look for, and evaluation methods for developing deep learning models. While optimization methods and software design practices for Deep Learning are still under development, this lecture will try to present several tried and true implementation and debugging strategies for diagnosing failures in model training and help make model training less painful in the future.
    Dylan Hadfield-Menell slides HW 3 due (1pm) / HW 4 released

    Assigned readings:
    - A Recipe for Training NNs
    - PyTorch 60 minute blitz

    Optional readings:
    - Rules of Machine Learning
    Thu 10/7 Visualization and interpretation of deep embeddings
    + abstract One of the great challenges of neural networks is to understand how they work. Machine learning leaves the programmer ignorant of the details of what the network computes internally, or why. But what if we could ask the network itself what it is thinking? I will discuss methods to directly probe the internal structure of a deep convolutional neural network by testing the activity of individual neurons. Beginning with the simple proposal that an individual neuron might represent one internal concept, I will talk about how to investigate the role of neurons within a deep network in a concrete, quantitative way: Which neurons? Which concepts? How are neurons organized? What is their causal role? Following this inquiry within state-of-the-art models in computer vision leads us to insights about the computational structure of those deep networks that enable several new applications, including "GAN Paint" semantic manipulation of objects in an image and quick, selective editing of generalizable rules within a fully trained GAN. In the talk, we challenge the notion that the internal calculations of a neural network must be hopelessly opaque. Instead, we strive to tear back the curtain and chart a path through the detailed structure of a deep network by which we can begin to understand its logic.
    David Bau slides Assigned readings:
    - Understanding the role of individual units in a deep neural network
    Week 6: Generative modeling and representation learning
    Tue 10/12 Generative models
    + abstract I will present the basic idea of deep generative modeling, and cover three of the most popular kinds: GANs, VAEs, and autoregressive models. I will also show how these models can be used for applications like structured prediction and domain translation.
    Phillip Isola slides HW 4 due (1pm) / HW 5 released

    Assigned readings:
    - GAN
    - CycleGAN

    Optional readings:
    - Tutorial on VAEs
    - DALL-E
    - WaveNet
    Thu 10/14 Self-supervised Scene Representation Learning
    + abstract Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is a highly underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations. Focusing on 3D structure, a fundamental feature of natural scenes, I will demonstrate how we can equip neural networks with inductive biases that enables them to learn 3D geometry, appearance, and even semantic information, self-supervised only from posed images. I will show how this approach unlocks the learning of priors, enabling 3D reconstruction from only a single posed 2D image, and how we may extend these representations to other modalities such as sound. I will then discuss how these efforts advance us towards a unified scene representation learning backbone to applications across computer vision, computer graphics, robotics, and other applications of computer science, and what key challenges remain.
    Vincent Sitzmann slides Assigned readings:
    - Scene representation networks
    - SIREN

    Optional readings:
    - NeRF
    - Light field networks
    - DeepSDF
    - PixelNeRF
    - more resources: here
    Week 7: Generative modeling and representation learning
    Tue 10/19 Cross-modal learning, self-supervision
    + abstract Despite exciting advances in the field of deep learning over the past decade, most state-of-the-art machine learning models require large quantities of annotated training data to achieve good performance. Over the last few years, there has been increasing attention paid to methods that do not require annotated data for learning, including a class of self-supervised learning methods that have been shown to be very effective across a range of modalities, including natural language and speech processing, and machine vision. In this lecture we discuss a related approach that tries to find commonalities across different modalities and is often called cross-modal learning. In particular we examine approaches that cross between language and vision, and describe recent research showing that we can learn effective correspondences between raw speech audio and raw images.
    Jim Glass slides HW 5 due (1pm) / HW 6 released

    Assigned readings:
    - Jointly discovering vision & audio (springer)

    Optional readings:
    - Hierarchical discrete linguistic units
    - Self-supervised audio-visual
    Thu 10/21 Contrastive learning / metric learning
    + abstract In this lecture, we will talk about unsupervised and weakly supervised learning, primarily through the lens of similarity driven learning. I’ll briefly talk about metric learning first, before moving onto self-supervised learning with a focus on contrastive learning (the modern cousin of metric learning).
    Suvrit Sra slides Assigned readings:
    - Align-uniformity in CL
    - CL with hard negatives

    Optional readings:
    - SimCLR
    - Contrastive predictive coding
    - Geometric mean metric learning
    - CLIP
    - Can CL avoid shortcuts?
    - Blogs: (CL) (representation)
    Week 8: Deep nets as priors
    Tue 10/26 Finetuning, transfer learning, distillation; "foundation models"
    + abstract A common critique of deep learning is that it requires enormous amounts of data and compute. This lecture will present ways to instead train deep nets with little data and little compute. The key idea is think of deep nets as stores of prior knowledge; when a new problem comes along, we will minimally adapt our prior nets to efficiently solve it.
    Phillip Isola slides HW 6 due (1pm) / HW 7 released

    Assigned readings:
    - Foundation models (section 1)

    Optional readings:
    - GPT3
    - BERT
    - CLIP
    - Universal computation engines
    Thu 10/28 Learning generalist models, learning to learn
    + abstract In learning, it once used to be the case that humans were generalists whereas machines were specialists; not anymore. I will talk about learning to learn, beginning with a brief review of meta learning in its broad sense and a detailed presentation of automated machine learning. Next, will demonstrate that machines are also generalists by automatically learning to learn dozens of MIT EECS, Physics, Economics, EuroAstro, and Math courses using the surprisingly simple yet strong result that foundation models trained on both text and code for program synthesis, such as OpenAI Codex, succeed in generating correct code for solving STEM course problems. The emerging ability of automatically​ learning to learn courses makes the human skills of learning courses similar to that of the skills of remembering phone numbers or navigation, before the age of mobile phones and GPS; and courses remain useful for humans for cognitive fitness. Finally, will talk about automated self-evaluation and self-improvement of foundation models that learn to learn.
    Iddo Drori slides Assigned readings:
    - Solving ML with ML

    Optional readings:
    - Codex
    - MAML (blog)
    Week 9: Deep learning on structured data
    Tue 11/2 Graph neural networks
    + abstract Machine learning tasks on graphs arise in a broad range of settings, including property prediction of molecules for drug and materials design, predicting interactions of chemical compounds, traffic forecasting (Google maps), recommender systems, social network analysis, learning and forecasting physics simulations (e.g., interactions of particles), and improving solvers for combinatorial problems via learnable components, e.g., for chip design. Graph Neural Networks (GNNs), specialized deep learning models for graph inputs, have recently led to many successes in these applications. In this lecture, I will introduce the most popular type of graph neural network and some of its applications. Then, we will explore some recent results on their learning: What can they represent well (or not)? What can they learn well? What does this depend on? Could we do better?
    Stefanie Jegelka slides HW 7 due (1pm) / HW 8 released

    *Final project handout*

    Assigned readings:
    - How powerful are GNNs?
    - GRL (chapter 5, 7.3 optional)

    Optional readings:
    - Intro to GNNs blog (part 1) (part 2)
    - Graph convolutional networks
    - Graph attention networks
    - Neural Message Passing for Quantum Chemistry (focus Section 2)
    Thu 11/4 Geometric deep learning
    + abstract Geometric deep learning supplies a common framework through which we may understand many successful architectures for learning on geometrically-structured domains. In this lecture, I will frame the typical machine learning task as a function approximation problem and demonstrate how the principles of invariance and equivariance may be leveraged to design function classes that encode powerful inductive biases. I will present popular architectures such as CNNs and GNNs as instances of this framework and illustrate via a worked example how gated RNN architectures (such as the widely-used LSTM) may derived from the principle of time-warping invariance.
    Chris Scarvelis slides Assigned readings:
    - Geometric deep learning

    Optional readings:
    - GDL (longer version)
    - GDL blog
    - Deep sets
    - Group equivariant CNNs
    - Can RNNs warp time
    Week 10: Hardware for deep learning
    Tue 11/9 Hardware architectures for deep learning
    + abstract As you know, deep neural networks (DNNs) are currently widely used for many AI applications. However, while DNNs deliver state-of-the-art accuracy on many AI tasks, that accuracy comes at the cost of high computational complexity. Therefore, designing efficient hardware architectures for deep neural networks is important for facilitating the wider deployment of DNN-based systems. Unfortunately, the state and trajectory of computing technology makes this a challenge in general-purpose processing engines and so there has been an increasing focus on dedicated hardware accelerators for DNNs. This lecture will outline these challenges and provide an overview of the design space of DNN accelerators including the management of data movement, exploiting sparsity and the potential applications of novel technologies.
    Joel Emer slides HW 8 due (1pm) / HW 9 released

    Assigned readings:
    - Survey on hardware for deep learning (Sections 1 through 4 are review you can skim; focus on Section 5 and beyond)

    Optional readings:
    - Eyeriss
    Thu 11/11 🎖️ No Class - Veterans Day 🎖️
    Week 11: Deep reinforcement learning
    Tue 11/16 Control, policy gradient, Q-learning
    + abstract Embodied agents often learn through the process of trial and error, interacting with an environment in diverse ways to learn how to perform complex tasks. In this lecture we will study how to formalize the interactive learning problem through the lens of reinforcement learning. In particular, we will start with the fundamentals of reinforcement learning and study how it connects to ideas in deep supervised and unsupervised learning. We will study efficient solution techniques for these classes of problems and understand how well they work for a variety of problems of interest ranging from character control to games. In the second part of the lecture, we will study how these ideas can be applied to real world robotics problems and why the resulting application is not so straightforward due to a range of mismatched assumptions. I will discuss some of our work in building algorithms and systems to bridge the gap and allow robotic learning systems to operate under the assumptions of the real world. I will show how these techniques can be applied to real world robotic systems at scale and discuss how these have the potential to be applicable more broadly across a variety of machine learning applications.
    Abhishek Gupta slides HW 9 due (1pm) / HW 10 released

    Assigned readings:
    - A (Long) Peek Into RL

    Optional readings:
    - Ingredients for Real World Robotic RL (blog)
    - Imitation from Observation
    - Accelerating Online RL with Offline Datasets
    Thu 11/18 Deep RL for robotics
    + abstract Many impressive robotic systems exist. A humanoid can perform backflips, walk over natural landscapes, etc. But why are robots so far away from being useful in our daily life? This lecture will discuss the shortcomings of existing approaches and elaborate on how machine learning techniques can help overcome some of these challenges. We will discuss different learning paradigms -- self-supervision, learning from demonstrations, and model-based / model-free learning. As part of this discussion, I will present learning-based methods for building robotic systems that can push objects, manipulate ropes, re-orient objects, navigate, locomote over complex terrains, etc. The lecture will end with perspectives on open problems and challenges.
    Pulkit Agrawal slides Final project proposal due (1pm)

    Assigned readings:
    - How to Train Your Robot with Deep RL

    Optional readings:
    - A Review of Robot Learning for Manipulation
    - In-Hand Object Re-Orientation
    Week 12: Neurosymbolic systems
    Tue 11/23 Compositionality and structured generalization
    + abstract These days we mostly hear about success in AI from relatively unstructured deep network models. But symbolic approaches to reasoning and learning---built from logical forms and discrete deduction rules rather than vector representations and linear algebra---also have a number of remarkable properties, notably extreme sample efficiency, strong out-of-distribution generalization, and human interpretability. What can deep learning learn from the successes of symbolic AI, and how can we design models that combine the strengths of deep representation learning and symbolic processing? This lecture will offer a (woefully incomplete) tour of modern neuro-symbolic approaches to question answering, sequential decision-making, and interpretability, and outline a few possible directions for future research.
    Jacob Andreas slides Optional readings:
    - Compositional NN for QA
    - Modular multitask RL
    - Analogs of Linguistic Structure
    Thu 11/25 🦃 No Class - Thanksgiving 🦃
    Week 13: Deep Learning in Industry and Society
    Tue 11/30 Deep Learning’s appetite for computation and what it means for performance, sustainability and business use
    + abstract In this lecture, we will discuss the economic and sustainability challenges of using deep learning in the real world. To ground this discussion, we’ll talk through a real case study of a business implementing deep learning. We’ll then broaden out to understand how progress is being made in deep learning and what that means for its sustainability – with sustainability being important for the environment, but also for whether deep learning can continue to be the main driving force for progress in machine learning.
    Neil Thompson slides Optional readings:
    - The Computational Limits of Deep Learning
    Thu 12/2 Academia to Industry: Applying Deep Learning at Scale Andrej Karpathy Optional videos:
    - Tesla AI Day
    - CVPR Workshop on Autonomous Vehicles
    Week 14: Additional topics
    Tue 12/7 The Lottery Ticket Hypothesis: On Sparse Trainable Neural Networks
    + abstract I recently proposed the lottery ticket hypothesis: that the dense neural networks we typically train have much smaller subnetworks capable of reaching full accuracy from early in training. This hypothesis raises (1) scientific questions about the nature of overparameterization in neural network optimization and (2) practical questions about our ability to accelerate training. In this talk, I will discuss established results and the latest developments in my line of work on the lottery ticket hypothesis, including the empirical evidence for these claims on small vision tasks, changes necessary to scale these ideas to practical settings, and the relationship between these subnetworks and their "stability" to the noise of stochastic gradient descent. I will also describe my vision for the future of research on this topic.
    Jonathan Frankle slides
    Optional readings:
    - The lottery ticket hypothesis
    - Linear mode connectivity
    Thu 12/9 Drug discovery
    + abstract Traditional approaches to drug discovery are expensive and time-consuming. In this lecture, I will discuss how to accelerate drug discovery with deep learning, and demonstrate their success in antibiotic discovery and COVID-19. The lecture consists of three parts: graph neural networks for virtual drug screening, graph-based generative models for molecular graphs, and geometric generative models for antibody design.
    Wengong Jin slides
    Final project due (11:59pm) Optional readings:
    - Antibiotic discovery
    - COVID19 discovery
    - Junction tree VAE
    - Hierarchical VAE
    - Iterative refinement GNN