论文标题

与看不见的合作伙伴一起评估哈纳比的彩虹DQN代理

Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners

论文作者

Canaan, Rodrigo, Gao, Xianbo, Chung, Youjin, Togelius, Julian, Nealen, Andy, Menzel, Stefan

论文摘要

哈纳比(Hanabi)是一款合作游戏,由于其专注于建模其他玩家的心理状态以解释和预测其行为,因此挑战了AI技术的挑战。尽管在游戏中可以在某些共同策略中获得接近完美的分数的代理人,但在临时合作环境中取得了相对较少的进步,在这种情况下,合作伙伴和策略未提前知道。在本文中,我们使用流行的RainbowDQN体系结构来彰显通过自我播放训练的代理商,无法与简单的基于规则的机构合作,而这些基于规则的代理商在训练过程中没有看到,相反,经过培训的代理人经过培训可以与任何基于规则的机构一起玩,甚至是这些代理的混合物,他们都无法取得良好的播放得分。

Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. While thereare agents that can achieve near-perfect scores in the game byagreeing on some shared strategy, comparatively little progresshas been made in ad-hoc cooperation settings, where partnersand strategies are not known in advance. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training and, conversely, whenthese agents are trained to play with any individual rule-basedagent, or even a mix of these agents, they fail to achieve goodself-play scores.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源