论文标题
部分可观测时空混沌系统的无模型预测
Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control
论文作者
论文摘要
动态和异质数据流量的爆炸性增长给5G和移动网络带来了巨大的挑战。为了提高网络容量和可靠性,我们提出了一种基于学习的动态时频分别双工(D-TFDD)方案,该方案可自适应地分配上行链路和下行链路的时间频率资源(BSS)(BSS),以满足非对称和异构性交通需求,同时减轻细胞间互动的情况。我们将问题提出为分散的部分可观察到的马尔可夫决策过程(DEC-POMDP),从而最大程度地提高了用户数据包下降率约束下的长期预期总和。为了以分散的方式共同优化全球资源,我们提出了一种称为联邦Wolpertinger Wolpertinger Deep确定性政策梯度(FWDDPG)算法的联合强化学习(RL)算法。 BSS通过RL算法决定其本地时间频率配置,并通过在分散的联合学习框架下与邻居交换本地RL模型来实现全球培训。具体来说,要处理每个BS的大规模离散动作空间,我们采用了基于DDPG的算法来在连续空间中生成动作,然后利用Wolpertinger策略来减少从连续的动作空间缩小映射错误,从而恢复离散的动作空间。模拟结果证明了我们提出的算法对系统总和速率的基准算法的优势。
The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.