Assignment #10
The readings for this assignment all revolve around hierarchical RL (HRL) and hierarchical planning, where we make decisions over high level choices that each entail a sequence of lower level actions.
Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are questions to think about (you should discuss at least some of these but
you don't have to address them all):
Feudal RL
Questions
- What information is being hidden? How do the principles of information-hiding in this paper extend to more complex domains that are, for example, not explicitly spatial?
- What "goals" is a higher-level controller communicating to its lower-level?
- The example studied in this paper is a stochastic shortest paths problem. Does the approach extend to more general reward functions?
Diversity is All You Need
Questions
- DIAYN optimizes for different skills to induce different state distributions, rather than different action distributions. Why? Is this always a good idea?
Can you construct a scenario where state diversity is not enough, and having skills with action diversity is also beneficial?
- How could knowledge of task rewards be used to improve skill learning beyond DIAYN alone (which ignores rewards)?
- Compare and contrast DIAYN with the papers we read for last class, on novelty search and curiosity.
Code as Policies
Questions
- Describe how Code as Policies can be viewed as a form of hierarchical planning. What is the hierarchy? What is doing the planning?
- To use Code as Policies, a human has to write elaborate prompts that teach the LLM how to translate natural language commands into code policies (see Appendix for examples).
How might you reduce this burden?
- In this paper, skills are represented as symbolic programs, whereas in DIAYN skills were represented by continuous neural policies,
and in Feudal RL skills were represented as tabular Q functions. What are some of the advantages and disadvantages of each kind of
skill representation?
Upload a single PDF file through Canvas by
March 14th at 1 pm.