Assignment #13
Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.
Deep RL that matters
Questions
- What might "random seed" govern in an RL experiment in a simulator?
- When (if ever) is it reasonable methodological practice to maximize over random seeds?
- When a plot shows a dark (mean) line and a lighter band around it, what is the meaning of that lighter band supposed to be?
Assessing Generalization in Deep RL
Questions
- This paper studies generalization across environmental conditions, such as the amount of sliding friction in a HalfCheetah's limbs. Suppose we want to study generalization across goal/reward conditions. Do you expect that vanilla methods, like PPO (i.e. policy gradients), could solve problems where the reward funciton or goal is different at test time?
- This gist of this paper is an age-old story: a bunch of fancy methods are published in year X; in year X+N a review comes out finding that simple baselines beat them. What sociotechnical forces might explain this? How might Henderson et al. (first paper authors) explain it?
- Would Henderson et al. approve of the experiments in this paper. What remaining criticisms might they give?
Upload a single PDF file through Stellar by Apr 7 at 10 am.