论文标题
多代理深层覆盖技能发现
Multi-agent Deep Covering Skill Discovery
论文作者
论文摘要
技能的使用(又称选项)可以大大加速增强学习中的探索,尤其是在只有稀疏的奖励信号时。尽管已经针对各个代理提出了期权发现方法,但在多机构强化学习设置中,发现可以协调多个代理的行为并鼓励他们访问其共同国家空间的未探索区域的协作选项。在这种情况下,我们提出了多代理深层覆盖期权发现,该发现通过最大程度地限制多个代理的共同状态空间的预期覆盖时间来构建多代理选项。此外,我们提出了一个新颖的框架,以在MARL过程中采用多代理选项。在实践中,多代理任务通常可以分为某些子任务,每个任务都可以由代理的子组完成。因此,我们的算法框架首先利用注意机制找到合作代理子组,这些子组将从协调的动作中受益最大。然后,开发了一种层次结构算法,即HA-MSAC,以了解每个子组首先完成其子任务的多代理选项,然后通过高级策略作为整个任务的解决方案来整合它们。这种层次结构构建使我们的框架可以在代理之间达到可伸缩性和有效协作之间的平衡。基于多代理协作任务的评估表明,所提出的算法可以有效地捕获与注意机制的代理相互作用,成功地识别多代理选项,并且在更快的探索和更高的任务奖励方面,使用单代理选项或无选项都显着优于先前的工作。
The use of skills (a.k.a., options) can greatly accelerate exploration in reinforcement learning, especially when only sparse reward signals are available. While option discovery methods have been proposed for individual agents, in multi-agent reinforcement learning settings, discovering collaborative options that can coordinate the behavior of multiple agents and encourage them to visit the under-explored regions of their joint state space has not been considered. In this case, we propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space. Also, we propose a novel framework to adopt the multi-agent options in the MARL process. In practice, a multi-agent task can usually be divided into some sub-tasks, each of which can be completed by a sub-group of the agents. Therefore, our algorithm framework first leverages an attention mechanism to find collaborative agent sub-groups that would benefit most from coordinated actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn the multi-agent options for each sub-group to complete their sub-tasks first, and then to integrate them through a high-level policy as the solution of the whole task. This hierarchical option construction allows our framework to strike a balance between scalability and effective collaboration among the agents. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher task rewards.