论文标题
具有平均奖励的网络系统的可扩展的多代理增强学习
Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
论文作者
论文摘要
长期以来,人们已经认识到,由于国家和行动空间的规模在代理人的数量中成倍庞大,因此多代理增强学习(MARL)面临重大的可伸缩性问题。在本文中,我们确定了一类丰富的网络MARL问题,其中该模型具有局部依赖结构,该结构允许它以可扩展的方式解决。具体来说,我们提出了一种可扩展的参与者(SAC)方法,该方法可以学习一种几乎最佳的本地化策略,以通过与整个网络相比,通过本地社区的状态空间大小来优化平均奖励。我们的结果集中在识别和利用指数衰减属性,以确保代理对彼此的影响在其图形距离上呈指数速度。
It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.