Assignment #2

The papers for this week focus on the problem of planning or, more generally, action selection given a current state, a transition model, and a goal set. We can turn such problems into a special case of an MDP in which:

Consider a domain with:

Both value iteration and state-space search methods can be used to solve these problems. Let's try to understand their relative computational complexity. Please provide brief answers to the following questions in your writeup. You may want to refer to background texts on value iteration, breadth-first search, and dynamic programming to brush up on those concepts (e.g., see Sutton & Barto Chapter 4).

Value iteration:

Breadth-first search:
If you're doing any kind of state-space search, such as Breadth-First Search (BFS), if at all possible you use some form of visited list or dynamic programming(DP).

Comparison

Heuristic
Heuristics can reduce the "effective branching factor" of BFS by pruning or postponing consideration of some action choices at each state in the search. Consider a heuristic that cuts the branching factor from m down to a*n for 0 < a <= 1.

In the next two classes, we'll consider model-based decision-making in stochastic domains and continuous domains. A bit later in the term, we'll consider learning with long-horizons and the relative merits of model-based vs model-free learning, and the importance of good explorations strategies.

Papers
Also provide a short discussion of each of the two assigned papers (listed under Course Materials). Keep in mind the implications of your answers from above.

Upload a single PDF file through Stellar by Feb 11 at 10 am.