6.882 Embodied Intelligence, Spring 2019

Assignment #5

Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about (you don't have to answer all of them):

MaskRCNN

Are object masks / bounding boxes a useful representation of state? For complex scenes, how should we choose what intermediate representations to supervise?
Suppose you do not have human annotations to use as supervision. What are some ways you might learn to see objects?
What are some limitations of this representation of images? What happens if an object is partially (or fully) occluded?

Learning to See Physics via Visual De-animation

This approach relies, in part, on a physics/graphics simulator as the teaching signal. What are the limitations of this? What if you don't have a physics simulator?
In real scenes, recovering the complete physical state may be impractical. If we want to use this approach to make predictions and achieve control, how can we deal with the problem of uncertainty about unobserved aspects of the state?
In a game of billiards I know that at the end of the day all balls will be in the pockets. How would the method in the paper go about making that prediction? What might be some difficulties or inefficiences in the presented approach?

The feeling of success

What are some problems where tactile information would be especially helpful? What about other modalities? For example, what kinds of problems would sound help a lot with?
Gelsight can sense 3D deformations as you press against a surface. What are some signals our fingers can sense that Gelsight does not?
Gelsight requires a camera that looks at a deforming surface. What are the limitations of this approach. Do you have some ideas for how we could practically create full body Gelsight skin?

Upload a single PDF file through Stellar by Feb 26 at 10 am.