- the transitions are deterministic
- the goal states have reward 0, and immediately transition into a zero-reward absorbing state
- all other states have reward -1

Consider a domain with:

- a specified initial state s0
- n discrete states
- m discrete actions
- deterministic transitions
- a goal set G
- a horizon of h (that is, a case in which the number of steps between s0 and the "closest" s in G is h)

Both value iteration and state-space search methods can be used to solve these problems. Let's try to understand their relative computational complexity. Please provide brief answers to the following questions in your writeup.

**Value iteration:**

- How many iterations of value iteration would be required to find the shortest path from s0 to a goal state?
- What is the computational complexity of using value iteration to find that path, as a function of n, m, and h?

**Breadth-first search:**

If you're doing any kind of state-space search, such as Breadth-First Search (BFS), if at all possible you use some form of visited list or dynamic programming(DP).

- What is the computational complexity, without any assumption about h, for finding the shortest path to a goal using BFS+DP?
- Now, taking h into consideration, and for simplicity assuming that the domain is a tree (so that there is no advantage to be gained from DP), what is the computational complexity of finding the shortest path using BFS?

**Comparison**

- Under what conditions on n, m, and h would it be a better idea to do value iteration than BFS, and vice versa?
- How does your answer change if you'll be asked to solve many action-selection problems, with different s0 but the same G, in the same domain?
- How does your answer change if you'll be asked to solve many action-selection problems, with different s0 and G?

**Heuristic**

Heuristics can reduce the "effective branching factor" of BFS by pruning or postponing consideration of some action choices at each state in the search.
Consider a heuristic that cuts the branching factor from m down to *a**n for 0 < *a* <= 1.

- Assuming BFS with no opportunity for DP and a horizon of h, what effect does
*a*have on the complexity of the search for a shortest path? - Heuristics tend to be derived from "relaxed" versions of the problem that are:
- computationally easier to solve than the original problem
- cheaper (in terms of path cost) to solve than the original problem

In the next two classes, we'll consider model-based decision-making in stochastic domains and continuous domains. A bit later in the term, we'll consider learning with long-horizons and the relative merits of model-based vs model-free learning, and the importance of good explorations strategies.

**Papers**

Provide a short discussion of each of the assigned papers (listed under Course Materials). Keep in mind the implications of your answers from above. Also, include questions for discussion in class.

Upload a single PDF file through Stellar by **Feb 12 at 10 am**.