通过在深度RL中通过政策信息正则合并多代理游戏

论文标题

通过在深度RL中通过政策信息正则合并多代理游戏

Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games

论文作者

Malloy, Tailia, Klinger, Tim, Liu, Miao, Riemer, Matthew, Tesauro, Gerald, Sims, Chris R.

论文摘要

本文在多代理深层确定性政策梯度（MADDPG）增强学习算法中对学习政策复杂性的信息理论限制引入了信息理论约束。在连续控制实验中采用相关方法的先前研究表明，这种方法有利于学习政策，这些政策更适合改变环境动态。多代理游戏设置自然需要这种类型的鲁棒性，因为其他代理的政策在整个学习过程中都会发生变化，从而引入了非组织环境。因此，将持续学习的最新方法与我们的方法进行了比较，称为能力有限的MADDPG。多代理合作和竞争任务的实验结果表明，能力有限的方法是改善这些环境中学习绩效的良好候选者。

This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Previous research with a related approach in continuous control experiments suggests that this method favors learning policies that are more robust to changing environment dynamics. The multi-agent game setting naturally requires this type of robustness, as other agents' policies change throughout learning, introducing a nonstationary environment. For this reason, recent methods in continual learning are compared to our approach, termed Capacity-Limited MADDPG. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题