Assignment #14
Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.
Survey of MARL
Sections II.B thru IV.B. are good background reading.
- Would a policy-gradient-based method work in this same setting? What would be the advantages or disadvantages relative to QMix?
- Is QMix susceptible to coordination problems?
- Can you give an example of a game for which the QMix approximation would be particularly bad?
Learning to communicate with deep MARL
- Imagine a situation in which agent 1 and agent 2 are going to
be in two different rooms, but what action they do should depend on
the state of both rooms (which they won't be able to observe,
individually, until they both reach their respective room and look
around). The ideal strategy would be for each agent to go into a
room, and then send a signal to the other one to indicate what they
found. But how can they learn this signalling protocol?
- How does DIAL make it easier for them to learn to signal?
Upload a single PDF file through Stellar by Apr 9 at 10 am.