RODE：学习分解多代理任务的角色

论文标题

RODE：学习分解多代理任务的角色

RODE: Learning Roles to Decompose Multi-Agent Tasks

论文作者

Wang, Tonghan, Gupta, Tarun, Mahajan, Anuj, Peng, Bei, Whiteson, Shimon, Zhang, Chongjie

论文摘要

基于角色的学习有望通过使用角色分解复杂的任务来实现可扩展的多代理学习。但是，在很大程度上不清楚如何有效地发现这样一组角色。为了解决这个问题，我们建议通过根据对环境和其他代理的影响将联合动作空间首先分解为受限制的角色作用空间。基于动作效果学习角色选择器使角色发现变得更加容易，因为它形成了双层学习层次结构 - 选择器在较小的角色空间和较低的时间分辨率中搜索的角色搜索，而角色策略在显着降低原始的动作观察空间中学习。我们将有关行动效应的信息进一步整合到策略以提高学习效率和政策概括的角色。凭借这些进步，我们的方法（1）在14个场景中的10种上优于当前的最新MARL算法，包括具有挑战性的Starcraft II微管理基准，（2）以三倍的代理数量实现了三倍的新环境。可以在https://sites.google.com/view/rode-marl上找到演示性视频。

Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles. However, it is largely unclear how to efficiently discover such a set of roles. To solve this problem, we propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents. Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces. We further integrate information about action effects into the role policies to boost learning efficiency and policy generalization. By virtue of these advances, our method (1) outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark and (2) achieves rapid transfer to new environments with three times the number of agents. Demonstrative videos are available at https://sites.google.com/view/rode-marl .

下载PDF全文

下载文献需遵守相关版权规定

论文标题