多代理强化学习的随机实体分解

论文标题

多代理强化学习的随机实体分解

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

论文作者

Iqbal, Shariq, de Witt, Christian A. Schroeder, Peng, Bei, Böhmer, Wendelin, Whiteson, Shimon, Sha, Fei

论文摘要

现实世界中的多代理设置通常涉及具有不同类型和数量的代理和非代理实体的任务；但是，这些代理/实体之间经常出现共同的行为模式。我们的方法旨在通过提出以下问题来利用这些共同点：``每种代理只考虑其观察到的实体的随机选择子群时，什么是什么预期的效用？''''''''''''提出这个反事实问题，我们可以识别实体子群体中的国家行动轨迹，我们在该任务中可能遇到了另一个任务中的其他任务，以便我们在当前的任务中遇到了一项预测。然后，我们将全部收益的预测重建为考虑这些脱节的实体组的组合，并训练``随机分配的价值函数作为基于价值的基于价值的多代理增强学习的辅助目标。基本线在挑战多任务星际争霸微管理设置方面具有显着的余地。

Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?'' By posing this counterfactual question, we can recognize state-action trajectories within sub-groups of entities that we may have encountered in another task and use what we learned in that task to inform our prediction in the current one. We then reconstruct a prediction of the full returns as a combination of factors considering these disjoint groups of entities and train this ``randomly factorized" value function as an auxiliary objective for value-based multi-agent reinforcement learning. By doing so, our model can recognize and leverage similarities across tasks to improve learning efficiency in a multi-task setting. Our approach, Randomized Entity-wise Factorization for Imagined Learning (REFIL), outperforms all strong baselines by a significant margin in challenging multi-task StarCraft micromanagement settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题