论文标题
亲切的同步:超越了多代理具体任务的边际政策
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
论文作者
论文摘要
自主代理必须学会合作。每当任务的难度超过单个代理的能力时,开发新的集中式代理是无法扩展的。尽管在类似网格世界的环境中蓬勃发展的多机构协作研究蓬勃发展,但相对较少的工作认为视觉上丰富的领域。在解决这个问题上,我们介绍了新颖的任务Furnmove,在该任务中,代理商共同努力,将一件家具通过客厅移动到目标。与现有的任务不同,FurnMove要求代理在每个时间步长协调。当培训代理完成FurnMove时,我们确定了两个挑战:现有的分散行动采样程序不允许表达性联合行动政策,并且在需要密切协调的任务中,失败的行动数量主导了成功的行动。为了应对这些挑战,我们引入同步派(连贯地同步您的行为)和亲切的(协调损失)。我们的代理商使用同步派和亲切的,在Furnmove上实现了58%的完成率,与竞争性分散的基线相比,绝对增益令人印象深刻。我们的数据集,代码和验证模型可在https://unnat.github.io/cordial-sync上找到。
Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .