Assignment #7
- For this class, we will be discussing planning methods for Partially Observable Markov Decision Processes (POMDPs). Next class we'll look at learning methods.
Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are questions to think about (you should discuss at least some of these but
you don't have to address them all):
POMDP Tutorial
You don't need to read the last 4 sections. Or, if you prefer equations, you can read
this paper, skipping the details of the Witness algorithm.
Questions
- Which problems that we have seen in class can be modeled as a POMDP? What is a problem that
cannot be modeled as an MDP or a POMDP (and why)?
- Why can a POMDP be seen as an MDP in belief space?
- Wnat can we say about the shape of the value function of a POMDP? Intuitively, why are the highest values along the outside edges?
POMCP
Skim the proofs in section 4.
- How does POMCP address the exploration and exploitation trade-off? What strategy is used for action selection and how do the belief states affect this?
- POMCP approximates the belief state with K particles. What is particle reinvigoration and why is it helpful? Do you think it will be more useful for smaller or larger K?
- POMCP assumes that there is a generative model G(s, a) -> (s’, o, r) which provides a sample of a successor state, observation, and reward. Do you think this is a reasonable assumption? How would POMCP fair with sparse rewards?
Belief-space Planning
Skip the analysis in last part of section IV.
Questions
- In what way is the planned mean trajectory better than the one found by B-LQR (Fig 1)?
- Why does the executed trajectory differ from the planned one (Fig 2)?
- Would it be possible to use an RRT, rather than trajectory optimization, within this method?
Upload a single PDF file through Canvas by
March 5th at 1 pm.