Assignment #15
Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.
Markov Games
Questions
- Why is it that in games such as checkers, backgammon and Go, "the
minimax operator (in minimax Q) can be implemented extremely
efficiently."
- What are some real-life examples of Markov Games?
TD-Gammon
Questions
- Compare TD(0), that is TD(lambda) for lambda = 0, to Deep Q-learning (without experience replay). It might be useful to skim Chapter 6 of Sutton and Barto's book.
- What is the role of non-zero lambda? Is there a connection to experience replay?
- Why can TD-Gammon get away with estimating a value function (called "equity" here) instead of a Q-value function?
- What arguments does Tesauro give for why self-play works in this domain? Discuss this in light of Littman's discussion.
Upload a single PDF file through Stellar by Apr 14 at 10 am.