深度卷积用于多代理通信，并具有增强的平均场近似值

论文标题

深度卷积用于多代理通信，并具有增强的平均场近似值

Depthwise Convolution for Multi-Agent Communication with Enhanced Mean-Field Approximation

论文作者

Xie, Donghan, Wang, Zhi, Chen, Chunlin, Dong, Daoyi

论文摘要

由于部分可观察性和跨代理之间缺乏准确的实时相互作用，多代理设置仍然是强化学习（RL）领域的基本挑战。在本文中，我们提出了一种基于本地交流学习的新方法，以应对许多共存的代理商中的多代理RL（MARL）挑战。首先，我们设计了一种新的通信协议，该协议利用了深度卷积有效提取本地关系并学习相邻代理之间的本地通信的能力。为了促进多机构协调，我们通过将邻近代理的政策作为输入来明确学习联合行动的影响。其次，我们将平均场近似值引入我们的方法，以减少代理相互作用的规模。为了更有效地协调相邻代理的行为，我们通过监督的策略纠正网络（PRN）增强了平均场近似，以纠正实时代理相互作用，并通过可学习的补偿项来纠正近似偏差。该方法实现了有效的协调，并且在自适应交通信号控制（ATSC）任务和Starcraft II多代理挑战（SMAC）方面的多种基线方法（SMAC）。

Multi-agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this paper, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).

下载PDF全文

下载文献需遵守相关版权规定

论文标题