Assignment #10
Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.
Feudal RL
Questions
- What information is being hidden? How do the principles of information-hiding in this paper extend to more complex domains that are, for example, not explicitly spatial?
- What "goals" is a higher-level controller communicating to its lower-level?
- The example studied in this paper is a stochastic shortest paths problem. Does the approach extend to more general reward functions?
h-DQN
Questions
- What information is being hidden?
- What goals are being communicated to the lower-level, e.g. in Atari? Are there goals that cannot be specified this way?
- When are the low-level actions terminated?
- Could this method be extended to have multiple levels?
- How does this architecture differ from feudal Q? Does it require more or less engineering effort to set up?
- "It is important to note that transitions generated by Q_2 run at a slower time-scale than the transitions generated by Q_1." Why is this? What determines when a transition happens at the Q_2 level?
- Could you train this same architecture using REINFORCE? In what ways would that be a good/bad idea?
- Is the game in section 4.1 an MDP? Why do you think this method works better than flat Q learning?
Upload a single PDF file through Stellar by Mar 12 at 10 am.