论文标题
任何玩法:零击协调的内在增强
Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination
论文作者
论文摘要
与人类或超人的协作任务中的合作人工智能是机器学习研究的前沿。先前的工作倾向于在自我竞争的限制范式下(由经过培训的代理组成)和跨游戏(经过独立培训但使用相同算法的代理团队)评估合作AI的性能。最近的工作表明,针对这些狭窄的设置进行了优化的AI可能会使现实世界中的不良合作者。我们为评估合作AI的替代标准形式化,称为Algorithm跨效果,其中对代理进行了对实验库中与所有其他代理的组合性能的评估,而无需假设代理之间的算法相似性。我们表明,在此范式中,现有的最先进的合作AI算法,例如其他播放和偏爱学习,表现不佳。我们提出了任何戏剧学习增强 - 基于多样性的内在奖励(ZSC)的多代理扩展 - 用于将基于自我播放的算法推广到Algorithm Inter-Algorithm跨观点设置。我们将任何游戏增强功能应用于简化的动作解码器(SAD),并在协作卡游戏Hanabi中展示了最先进的性能。
Cooperative artificial intelligence with human or superhuman proficiency in collaborative tasks stands at the frontier of machine learning research. Prior work has tended to evaluate cooperative AI performance under the restrictive paradigms of self-play (teams composed of agents trained together) and cross-play (teams of agents trained independently but using the same algorithm). Recent work has indicated that AI optimized for these narrow settings may make for undesirable collaborators in the real-world. We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play, where agents are evaluated on teaming performance with all other agents within an experiment pool with no assumption of algorithmic similarities between agents. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation -- a multi-agent extension of diversity-based intrinsic rewards for zero-shot coordination (ZSC) -- for generalizing self-play-based algorithms to the inter-algorithm cross-play setting. We apply the Any-Play learning augmentation to the Simplified Action Decoder (SAD) and demonstrate state-of-the-art performance in the collaborative card game Hanabi.