Assignment #2
The papers for this week focus on the problem of planning or, more generally, action selection given a current state, a transition model, and a goal set. We can turn such problems into a special case of an MDP in which:
- the transitions are deterministic
- the goal states have reward 0, and immediately transition into a zero-reward absorbing state
- all other states have reward -1
Consider a domain with:
- a specified initial state s0
- n discrete states
- m discrete actions
- deterministic transitions
- a goal set G
- a horizon of h (that is, a case in which the number of steps between s0 and the "closest" s in G is h)
Both value iteration and state-space search methods can be used to solve these problems. Let's try to understand their relative computational complexity.
Please provide brief answers to the following questions in your writeup. You may want to refer to background texts on value iteration, breadth-first search, and dynamic programming to brush up on those concepts (e.g., see Sutton & Barto Chapter 4).
Value iteration:
- How many iterations of value iteration would be required to find the shortest path from s0 to a goal state?
- What is the computational complexity of using value iteration to find that path, as a function of n, m, and h?
Breadth-first search:
If you're doing any kind of state-space search, such as Breadth-First Search (BFS), if at all possible you use some form of visited list or dynamic programming(DP).
- What is the computational complexity (worst case), without any assumption about h, for finding the shortest path to a goal using BFS+DP?
- Now, taking h into consideration, and for simplicity assuming that the domain is a tree (so that there is no advantage to be gained from DP), what is the computational complexity of finding the shortest path using BFS?
Comparison
- Under what conditions on n, m, and h would it be a better idea to do value iteration than BFS, and vice versa?
- How does your answer change if you'll be asked to solve many action-selection problems, with different s0 but the same G, in the same domain?
- How does your answer change if you'll be asked to solve many action-selection problems, with different s0 and G?
Heuristic
Heuristics can reduce the "effective branching factor" of BFS by pruning or postponing consideration of some action choices at each state in the search.
Consider a heuristic that cuts the branching factor from m down to a*n for 0 < a <= 1.
- Assuming BFS with no opportunity for DP and a horizon of h, what effect does a have on the complexity of the search for a shortest path?
- Heuristics tend to be derived from "relaxed" versions of the problem that are:
- computationally easier to solve than the original problem
- cheaper (in terms of path cost) to solve than the original problem
Why are both of these features important?
In the next two classes, we'll consider model-based decision-making in stochastic domains and continuous domains. A bit later in the term, we'll consider learning with long-horizons and the relative merits of model-based vs model-free learning, and the importance of good explorations strategies.
Papers
Also provide a short discussion of each of the two assigned papers (listed under Course Materials). Keep in mind the implications of your answers from above.
Upload a single PDF file through Stellar by Feb 11 at 10 am.