6.882 Embodied Intelligence, Spring 2020

Assignment #10

Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.

Feudal RL

Questions

What information is being hidden? How do the principles of information-hiding in this paper extend to more complex domains that are, for example, not explicitly spatial?
What "goals" is a higher-level controller communicating to its lower-level?
The example studied in this paper is a stochastic shortest paths problem. Does the approach extend to more general reward functions?

h-DQN

Questions

What information is being hidden?
What goals are being communicated to the lower-level, e.g. in Atari? Are there goals that cannot be specified this way?
When are the low-level actions terminated?
Could this method be extended to have multiple levels?
How does this architecture differ from feudal Q? Does it require more or less engineering effort to set up?
"It is important to note that transitions generated by Q_2 run at a slower time-scale than the transitions generated by Q_1." Why is this? What determines when a transition happens at the Q_2 level?
Could you train this same architecture using REINFORCE? In what ways would that be a good/bad idea?
Is the game in section 4.1 an MDP? Why do you think this method works better than flat Q learning?

Upload a single PDF file through Stellar by Mar 12 at 10 am.