Assignment #2

The papers for this week focus on the problem of planning or, more generally, action selection given a current state, a transition model, and a goal set. Please provide brief answers to the following questions in your writeup.

Warm up
You may want to refer to background texts on breadth-first search, value iteration, and dynamic programming to brush up on those concepts (e.g., see Sutton & Barto: Chapter 4).


Bonet & Geffner: Heuristic Search Planner
Heuristics tend to be derived from “relaxed” versions of the problem that are:

  1. computationally easier to solve than the original problem
  2. underestimates of the solution cost of the original problem
Why are both of these features important? What relaxation do the heuristics in the HSP and HSP2 planner use, and why do you think this form of relaxation is so effective?


Iterated Width (IW) Search
This paper introduces a breadth-first search method based on a measure of “novelty.”

Explain in your own words what novelty means. How is this definition related to the intuitive meaning of novelty?

Why does this method accelerate search on real problems? What property of real problems is this method exploiting? Why does pruning based on low novelty make sense?


Planning with Pixels
This paper uses simple pixel colors as the features for an IW search. What might be better features to use?

This method works surprisingly well on Atari games. What issues do you think it might encounter if you tried to apply it to more complex video games, or real world robotics problems?


In the next two classes, we'll consider model-based decision-making in stochastic domains and continuous domains. A bit later in the term, we'll consider learning with long-horizons and the relative merits of model-based vs model-free learning, and the importance of good explorations strategies.

Upload a single PDF file through Canvas by Feb 13 at 1pm.