TLEAGUE：基于竞争性自我竞争的分布式多代理增强学习框架

论文标题

TLEAGUE：基于竞争性自我竞争的分布式多代理增强学习框架

TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning

论文作者

Sun, Peng, Xiong, Jiechao, Han, Lei, Sun, Xinghai, Li, Shuxing, Xu, Jiawei, Fang, Meng, Zhang, Zhengyou

论文摘要

基于竞争性的自我游戏（CSP）多代理增强学习（MARL）最近显示了惊人的突破。几个基准，包括Dota 2，国王的荣耀，Quake III，Starcraft II等基准，可以实现强大的AIS。尽管取得了成功，但MARL培训还是非常渴望的，通常需要在训练期间从环境中看到数十亿美元（如果不是数万亿）框架，以学习高性能代理。这给研究人员或工程师带来了非平凡的困难，并防止将MARL应用于更广泛的现实世界问题。为了解决这个问题，在本手稿中，我们描述了一个被称为Tleague的框架，该框架旨在大规模培训并实施几种主流CSP-MARL算法。可以将培训部署在单台计算机或混合机群（CPU和GPU）中，其中标准的Kubernetes以云原生的方式支持。进行分布式培训时，Tleague实现了很高的吞吐量和合理的规模。多亏了模块化设计，也很容易扩展用于解决其他多代理问题或实施和验证MARL算法。我们介绍了关于星际争霸II，Vizdoom和Pommerman的实验，以显示Tleague的效率和有效性。该代码是开源的，可在https://github.com/tencent-ailab/tleague_projpage上找到

Competitive Self-Play (CSP) based Multi-Agent Reinforcement Learning (MARL) has shown phenomenal breakthroughs recently. Strong AIs are achieved for several benchmarks, including Dota 2, Glory of Kings, Quake III, StarCraft II, to name a few. Despite the success, the MARL training is extremely data thirsty, requiring typically billions of (if not trillions of) frames be seen from the environment during training in order for learning a high performance agent. This poses non-trivial difficulties for researchers or engineers and prevents the application of MARL to a broader range of real-world problems. To address this issue, in this manuscript we describe a framework, referred to as TLeague, that aims at large-scale training and implements several main-stream CSP-MARL algorithms. The training can be deployed in either a single machine or a cluster of hybrid machines (CPUs and GPUs), where the standard Kubernetes is supported in a cloud native manner. TLeague achieves a high throughput and a reasonable scale-up when performing distributed training. Thanks to the modular design, it is also easy to extend for solving other multi-agent problems or implementing and verifying MARL algorithms. We present experiments over StarCraft II, ViZDoom and Pommerman to show the efficiency and effectiveness of TLeague. The code is open-sourced and available at https://github.com/tencent-ailab/tleague_projpage

下载PDF全文

下载文献需遵守相关版权规定

论文标题