Battlesnake Challenge：一个与人类的多项式增强学习游乐场

论文标题

Battlesnake Challenge：一个与人类的多项式增强学习游乐场

Battlesnake Challenge: A Multi-agent Reinforcement Learning Playground with Human-in-the-loop

论文作者

Chung, Jonathan, Luo, Anna, Raffin, Xavier, Perry, Scott

论文摘要

我们提出了Battlesnake挑战，这是通过人类在循环学习（HILL）的多机构增强学习框架。它是在Battlesnake上开发的，Battlesnake是传统蛇游戏的多人游戏扩展，其中有2个或更多的蛇争夺最终生存。 BattlesNake挑战由一个用于模型培训的离线模块和一个用于现场比赛的在线模块。我们为离线多代理模型培训开发了模拟的游戏环境，并确定一组可以灌输以改善学习的基线启发式方法。我们的框架是代理 - 敏捷和启发式方法，使研究人员可以设计自己的算法，训练自己的模型并在在线BattlesNake竞争中进行演示。我们通过我们的初步实验来验证框架和基线启发式方法。我们的结果表明，使用拟议的Hill方法的代理商始终超过没有山丘的代理商。此外，奖励操纵的启发式方法在在线比赛中表现最好。我们在https://github.com/awslabs/sagemaker-battlesnake-ai上开源框架。

We present the Battlesnake Challenge, a framework for multi-agent reinforcement learning with Human-In-the-Loop Learning (HILL). It is developed upon Battlesnake, a multiplayer extension of the traditional Snake game in which 2 or more snakes compete for the final survival. The Battlesnake Challenge consists of an offline module for model training and an online module for live competitions. We develop a simulated game environment for the offline multi-agent model training and identify a set of baseline heuristics that can be instilled to improve learning. Our framework is agent-agnostic and heuristics-agnostic such that researchers can design their own algorithms, train their models, and demonstrate in the online Battlesnake competition. We validate the framework and baseline heuristics with our preliminary experiments. Our results show that agents with the proposed HILL methods consistently outperform agents without HILL. Besides, heuristics of reward manipulation had the best performance in the online competition. We open source our framework at https://github.com/awslabs/sagemaker-battlesnake-ai.

下载PDF全文

下载文献需遵守相关版权规定

论文标题