6.882 Embodied Intelligence, Spring 2020

Assignment #15

Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.

Markov Games

Questions

Why is it that in games such as checkers, backgammon and Go, "the minimax operator (in minimax Q) can be implemented extremely efficiently."
What are some real-life examples of Markov Games?

TD-Gammon

Questions

Compare TD(0), that is TD(lambda) for lambda = 0, to Deep Q-learning (without experience replay). It might be useful to skim Chapter 6 of Sutton and Barto's book.
What is the role of non-zero lambda? Is there a connection to experience replay?
Why can TD-Gammon get away with estimating a value function (called "equity" here) instead of a Q-value function?
What arguments does Tesauro give for why self-play works in this domain? Discuss this in light of Littman's discussion.

Upload a single PDF file through Stellar by Apr 14 at 10 am.