6.882 Embodied Intelligence, Spring 2020

Assignment #6

Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about (you don't have to answer all of them, the general question may be the most interesting):

Large Scale Unsupervised Learning

This paper suggests that compressed representations are meaningful representations -- they align with English words, like "cat". Why do you think this is? Can you imagine a universe in which this would not be the case?
The paper mentions that previous methods involved "a certain degree of supervision" since training images were "aligned, homogeneous and belong to one selected category." Is this a substantial limiation? Does the current paper get around it?
Fig 3 shows two visualizations of what a trained neuron responds to. What are the advantages and disadvantages of each approach. Can you think of a scenario where one or the other would lead to a misleading visualization?

Contrastive Predictive Coding

CPC's objective is to learn a representation of the present that is maximally informative about the future. The raw pixels themselves are one such representation (by the data processing inequality, they are maximally informative). Why does CPC learn a better representation than just using the raw pixels?
Suppose we apply CPC to video frames. In that case, CPC would learn to classify between the real future frame, x_{t+k}, and a mismatched future frame, x_{j}, where x_{j} is sampled from a different video than x_{t}. What do you think will happen if each video in our dataset has a uniquely colored border around each of its frames? What features will be learned?

General questions

This week we have seen various approaches to perceptual representation, which looked at different kinds of supervision and inductive biases that can induce good representations. Compare and contrast these approaches. In each one, how much knowledge is baked in and how much is emergent?

Do you think these kinds of methods, scaled way up, will result in representations similar to the mental representations in our brains? If not, what's missing?

Upload a single PDF file through Stellar by Feb 27 at 10 am.