Assignment #7
Background for all three papers
- We'll be discussing planning methods for Partially Observable Markov Decision Processes (POMDPs). Next class we'll look at learning methods.
Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.
POMDP Tutorial
You don't need to read the last 4 sections. Or, if you prefer equations, you can read this paper, skipping the details of the Witness algorithm.
Questions
- Which problems that we have seen in class can be modeled as a POMDP? What is a problem that
cannot be modeled as an MDP or a POMDP (and why)?
- Why can a POMDP be seen as an MDP in belief space?
- Wnat can we say about the shape of the value function of a POMDP? Intuitively, why are the highest values along the outside edges?
DESPOT: Online POMDP Planning
It's fine to skip the theorems. Also, skip Sections 4.2-4.5 and 5.3
Questions
- Alice suggests re-invoking the DESPOT algorithm with belief b' if an unanticipated observation occurs, rather than using a default policy. Would this be an improvement?
- Bob suggests re-invoking the DESPOT algorithm with belief b' after every action execution, even if the result is anticipated. Would this be an improvement?
- What's the difference between a belief tree and a policy tree?
- What are conditions under which a policy will have a small size?
- Intuitively, what are the two different "cases" inside the outer max in equation 9?
Belief-space Planning
Skip the analysis in last part of section III.
Questions
- In what way is the planned mean trajectory better than the one found by B-LQR (Fig 1)?
- Why does the executed trajectory differ from the planned one (Fig 2)?
- In a domain with dangerous outcomes, what method is safer, DESPOT or the one in this paper?
- Would it be possible to use an RRT, rather than trajectory optimization, within this method?
Upload a single PDF file through Stellar by
Mar 5 at 10 am.