6.882 Embodied Intelligence, Spring 2020

Assignment #14

Provide a short discussion of each of the assigned papers (listed under Course Materials). Below are some questions to think about.

Survey of MARL

Sections II.B thru IV.B. are good background reading.

QMix

Questions

Would a policy-gradient-based method work in this same setting? What would be the advantages or disadvantages relative to QMix?
Is QMix susceptible to coordination problems?
Can you give an example of a game for which the QMix approximation would be particularly bad?

Learning to communicate with deep MARL

Questions

Imagine a situation in which agent 1 and agent 2 are going to be in two different rooms, but what action they do should depend on the state of both rooms (which they won't be able to observe, individually, until they both reach their respective room and look around). The ideal strategy would be for each agent to go into a room, and then send a signal to the other one to indicate what they found. But how can they learn this signalling protocol?
How does DIAL make it easier for them to learn to signal?

Upload a single PDF file through Stellar by Apr 9 at 10 am.