通过多代理深入强化学习中无线网络中的资源管理

论文标题

通过多代理深入强化学习中无线网络中的资源管理

Resource Management in Wireless Networks via Multi-Agent Deep Reinforcement Learning

论文作者

Naderializadeh, Navid, Sydir, Jaroslaw, Simsek, Meryem, Nikopour, Hosein

论文摘要

我们提出了一种使用多代理深入学习（RL）的无线网络中分布式资源管理和干扰缓解的机制。我们将网络中的每个发射机配备了一个深度RL代理，该代理会从其相关用户那里接收延迟的观察结果，同时还可以与其相邻代理进行观察结果，并决定在每个调度间隔内使用哪种用户以及要使用的传输功率。我们提出的框架使代理商能够以分布式的方式同时做出决定，这不了解其他代理人的并发决定。此外，我们对代理商的观察和动作空间的设计是可扩展的，因为在具有特定数量的发射器和用户的情况下，经过培训的代理可以将其应用于具有不同数量的发射机和/或用户的情况。模拟结果表明，与分散的基线相比，我们所提出的方法的优势在平均至5^{th} $百分位的用户速率之间的权衡方面，同时在某些情况下，甚至在某些情况下都超越了绩效，甚至在某些情况下都胜过集中的信息理论基线。我们还表明，在火车和测试部署之间遇到不匹配时，训练有素的代理商很强，并保持其性能提高。

We propose a mechanism for distributed resource management and interference mitigation in wireless networks using multi-agent deep reinforcement learning (RL). We equip each transmitter in the network with a deep RL agent that receives delayed observations from its associated users, while also exchanging observations with its neighboring agents, and decides on which user to serve and what transmit power to use at each scheduling interval. Our proposed framework enables agents to make decisions simultaneously and in a distributed manner, unaware of the concurrent decisions of other agents. Moreover, our design of the agents' observation and action spaces is scalable, in the sense that an agent trained on a scenario with a specific number of transmitters and users can be applied to scenarios with different numbers of transmitters and/or users. Simulation results demonstrate the superiority of our proposed approach compared to decentralized baselines in terms of the tradeoff between average and $5^{th}$ percentile user rates, while achieving performance close to, and even in certain cases outperforming, that of a centralized information-theoretic baseline. We also show that our trained agents are robust and maintain their performance gains when experiencing mismatches between train and test deployments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题