论文标题
Diambra Arena:研究和实验的新增强学习平台
DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation
论文作者
论文摘要
强化学习的最新进展导致有效的方法能够在非常复杂的环境中获得以上人类水平的表现。但是,一旦解决,这些环境就会变得不那么宝贵,并且需要具有不同或更复杂情景的新挑战来支持研究进展。这项工作介绍了Diambra Arena,这是一个新的增强学习研究和实验平台,其中包含一系列高质量的环境,揭示了完全符合OpenAI Gym标准的Python API。它们是由原始像素以及其他数值组成的离散动作和观察值的情节任务,所有这些都支持单人和两个玩家模式,从而可以在标准的增强学习,竞争性的多代理,人类代理,人类竞争,自我竞争,人类自我竞争,人类在循环培训和模仿学习上工作。通过成功培训多个深入强化学习代理商,通过近端策略优化获得类似人类的行为来证明软件能力。结果证实了Diambra竞技场作为增强学习研究工具的实用性,提供了旨在研究该领域一些最具挑战性的主题的环境。
The recent advances in reinforcement learning have led to effective methods able to obtain above human-level performances in very complex environments. However, once solved, these environments become less valuable, and new challenges with different or more complex scenarios are needed to support research advances. This work presents DIAMBRA Arena, a new platform for reinforcement learning research and experimentation, featuring a collection of high-quality environments exposing a Python API fully compliant with OpenAI Gym standard. They are episodic tasks with discrete actions and observations composed by raw pixels plus additional numerical values, all supporting both single player and two players mode, allowing to work on standard reinforcement learning, competitive multi-agent, human-agent competition, self-play, human-in-the-loop training and imitation learning. Software capabilities are demonstrated by successfully training multiple deep reinforcement learning agents with proximal policy optimization obtaining human-like behavior. Results confirm the utility of DIAMBRA Arena as a reinforcement learning research tool, providing environments designed to study some of the most challenging topics in the field.