论文标题
多代理增强学习模型的缩放法律
Scaling Laws for a Multi-Agent Reinforcement Learning Model
论文作者
论文摘要
最近对神经幂律缩放关系的观察对深度学习领域产生了重大影响。由于对缩放定律的描述,尽管主要用于监督学习,而仅在减少强化学习框架的程度下,因此已经专注于大量关注。在本文中,我们介绍了基石增强算法Alphazero的性能缩放的广泛研究。在ELO评级,发挥力量和强力缩放率之间的关系中,我们在比赛中训练Alphazero代理人将FORT和Pentago连接起来,并分析其性能。我们发现,当不通过可用的计算瓶装瓶颈以及训练最佳尺寸尺寸的代理时,玩家的力量尺度是神经网络参数计数中的幂律。我们观察到这两款游戏几乎相同的缩放指数。结合了两个观察到的缩放定律,我们获得了一项功率定律,该法律与最佳规模相关的计算与语言模型观察到的尺寸相似。我们发现,最佳神经网络大小的预测缩放符合两种游戏的数据。这项扩展法意味着,鉴于各自的计算预算,先前发布的最新游戏模型明显小于其最佳尺寸。我们还表明,大型Alphazero模型的样本效率更高,比具有相同数量的训练数据的较小模型表现更好。
The recent observation of neural power-law scaling relations has made a significant impact in the field of deep learning. A substantial amount of attention has been dedicated as a consequence to the description of scaling laws, although mostly for supervised learning and only to a reduced extent for reinforcement learning frameworks. In this paper we present an extensive study of performance scaling for a cornerstone reinforcement learning algorithm, AlphaZero. On the basis of a relationship between Elo rating, playing strength and power-law scaling, we train AlphaZero agents on the games Connect Four and Pentago and analyze their performance. We find that player strength scales as a power law in neural network parameter count when not bottlenecked by available compute, and as a power of compute when training optimally sized agents. We observe nearly identical scaling exponents for both games. Combining the two observed scaling laws we obtain a power law relating optimal size to compute similar to the ones observed for language models. We find that the predicted scaling of optimal neural network size fits our data for both games. This scaling law implies that previously published state-of-the-art game-playing models are significantly smaller than their optimal size, given the respective compute budgets. We also show that large AlphaZero models are more sample efficient, performing better than smaller models with the same amount of training data.