6.882 Embodied Intelligence, Spring 2019

Assignment #7

Background for all three papers

We'll be discussing planning methods for Partially Observable Markov Decision Processes (POMDPs). Next class we'll look at learning methods.

Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.

POMDP Tutorial
You don't need to read the last 4 sections. Or, if you prefer equations, you can read this paper, skipping the details of the Witness algorithm.
Questions

Which problems that we have seen in class can be modeled as a POMDP? What is a problem that cannot be modeled as an MDP or a POMDP (and why)?
Why can a POMDP be seen as an MDP in belief space?
Wnat can we say about the shape of the value function of a POMDP? Intuitively, why are the highest values along the outside edges?

DESPOT: Online POMDP Planning
It's fine to skip the theorems. Also, skip Sections 4.2-4.5 and 5.3
Questions

Alice suggests re-invoking the DESPOT algorithm with belief b' if an unanticipated observation occurs, rather than using a default policy. Would this be an improvement?
Bob suggests re-invoking the DESPOT algorithm with belief b' after every action execution, even if the result is anticipated. Would this be an improvement?
What's the difference between a belief tree and a policy tree?
What are conditions under which a policy will have a small size?
Intuitively, what are the two different "cases" inside the outer max in equation 9?

Belief-space Planning
Skip the analysis in last part of section III.
Questions

In what way is the planned mean trajectory better than the one found by B-LQR (Fig 1)?
Why does the executed trajectory differ from the planned one (Fig 2)?
In a domain with dangerous outcomes, what method is safer, DESPOT or the one in this paper?
Would it be possible to use an RRT, rather than trajectory optimization, within this method?

Upload a single PDF file through Stellar by Mar 5 at 10 am.