6.s953 Embodied Intelligence, Spring 2024

Assignment #7

For this class, we will be discussing planning methods for Partially Observable Markov Decision Processes (POMDPs). Next class we'll look at learning methods.

Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are questions to think about (you should discuss at least some of these but you don't have to address them all):

POMDP Tutorial
You don't need to read the last 4 sections. Or, if you prefer equations, you can read this paper, skipping the details of the Witness algorithm.
Questions

Which problems that we have seen in class can be modeled as a POMDP? What is a problem that cannot be modeled as an MDP or a POMDP (and why)?
Why can a POMDP be seen as an MDP in belief space?
Wnat can we say about the shape of the value function of a POMDP? Intuitively, why are the highest values along the outside edges?

POMCP
Skim the proofs in section 4.

How does POMCP address the exploration and exploitation trade-off? What strategy is used for action selection and how do the belief states affect this?
POMCP approximates the belief state with K particles. What is particle reinvigoration and why is it helpful? Do you think it will be more useful for smaller or larger K?
POMCP assumes that there is a generative model G(s, a) -> (s’, o, r) which provides a sample of a successor state, observation, and reward. Do you think this is a reasonable assumption? How would POMCP fair with sparse rewards?

Belief-space Planning
Skip the analysis in last part of section IV.
Questions

In what way is the planned mean trajectory better than the one found by B-LQR (Fig 1)?
Why does the executed trajectory differ from the planned one (Fig 2)?
Would it be possible to use an RRT, rather than trajectory optimization, within this method?

Upload a single PDF file through Canvas by March 5th at 1 pm.