平均野外游戏的个体级倒数增强学习

论文标题

平均野外游戏的个体级倒数增强学习

Individual-Level Inverse Reinforcement Learning for Mean Field Games

论文作者

Chen, Yang, Zhang, Libo, Liu, Jiamou, Hu, Shuyue

论文摘要

最近的平均野外游戏（MFG）形式主义使大型多机构系统中的反向加固学习（IRL）方法的应用，其目的是推断奖励信号，这些信号可以解释大量人群的行为。现有的MFG IRL方法是基于将MFG减少到基于人口的集体行为和平均奖励的Markov决策过程（MDP）的基础。但是，本文表明，从MFG到MDP的减少仅适用于完全合作的环境。这种限制使MFG上的现有IRL方法无效，而具有非合作性环境。为了衡量大量人群中更多的一般行为，我们研究了个人行为的使用来推断MFG的地面真相奖励功能。我们提出了平均领域IRL（MFIRL），这是第一个可以处理合作环境和非合作环境的MFG的专用IRL框架。基于这个理论上合理的框架，我们开发了一种对动态未知的MFG有效的实用算法。我们在与许多代理商的合作和混合合作竞争的情况下评估MFIRL。结果表明，面对变化的动态，MFIRL在奖励恢复，样本效率和鲁棒性方面表现出色。

The recent mean field game (MFG) formalism has enabled the application of inverse reinforcement learning (IRL) methods in large-scale multi-agent systems, with the goal of inferring reward signals that can explain demonstrated behaviours of large populations. The existing IRL methods for MFGs are built upon reducing an MFG to a Markov decision process (MDP) defined on the collective behaviours and average rewards of the population. However, this paper reveals that the reduction from MFG to MDP holds only for the fully cooperative setting. This limitation invalidates existing IRL methods on MFGs with non-cooperative environments. To measure more general behaviours in large populations, we study the use of individual behaviours to infer ground-truth reward functions for MFGs. We propose Mean Field IRL (MFIRL), the first dedicated IRL framework for MFGs that can handle both cooperative and non-cooperative environments. Based on this theoretically justified framework, we develop a practical algorithm effective for MFGs with unknown dynamics. We evaluate MFIRL on both cooperative and mixed cooperative-competitive scenarios with many agents. Results demonstrate that MFIRL excels in reward recovery, sample efficiency and robustness in the face of changing dynamics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题