基于学习的5G和超越网络的多渠道访问具有快速变化的频道

论文标题

基于学习的5G和超越网络的多渠道访问具有快速变化的频道

Learning-Based Multi-Channel Access in 5G and Beyond Networks with Fast Time-Varying Channels

论文作者

Wang, Shaoyang, Lv, Tiejun, Zhang, Xuewei, Lin, Zhipeng, Huang, Pingmu

论文摘要

我们提出了一个基于学习的方案，以研究第五代（5G）和具有快速变化的通道的网络中的动态多通道访问（DMCA）问题，其中通道参数未知。提出的基于学习的方案可以长期保持近乎最佳的性能，即使在急剧变化的渠道中。该方案大大减少了处理延迟，并有效地减轻了由于决策滞后而引起的错误，这是由于信息采集和处理的非媒介而构成的。我们首先提出了一种基于心理的个性化服务模型，在使用未知频道参数和流媒体模型引入网络模型之后。然后，为生活流模型和缓冲流模型提供了两个访问标准。他们相应的优化问题也被提出。通过基于学习的DMCA方案解决了优化问题，该方案将复发性神经网络与深度强化学习结合在一起。在基于学习的DMCA方案中，代理主要调用所提出的基于预测的深层确定性策略梯度算法作为学习算法。作为一种新颖的技术范式，我们的计划具有强大的普遍性，因为它很容易扩展以解决无线通信中的其他问题。基于渠道数据的仿真结果验证了基于学习的方案的性能在每个时间段做出决定时从详尽的搜索中得出的方法，并且在每次插槽中做出决定时都优于详尽的搜索方法。

We propose a learning-based scheme to investigate the dynamic multi-channel access (DMCA) problem in the fifth generation (5G) and beyond networks with fast time-varying channels wherein the channel parameters are unknown. The proposed learning-based scheme can maintain near-optimal performance for a long time, even in the sharp changing channels. This scheme greatly reduces processing delay, and effectively alleviates the error due to decision lag, which is cased by the non-immediacy of the information acquisition and processing. We first propose a psychology-based personalized quality of service model after introducing the network model with unknown channel parameters and the streaming model. Then, two access criteria are presented for the living streaming model and the buffered streaming model. Their corresponding optimization problems are also formulated. The optimization problems are solved by learning-based DMCA scheme, which combines the recurrent neural network with deep reinforcement learning. In the learning-based DMCA scheme, the agent mainly invokes the proposed prediction-based deep deterministic policy gradient algorithm as the learning algorithm. As a novel technical paradigm, our scheme has strong universality, since it can be easily extended to solve other problems in wireless communications. The real channel data-based simulation results validate that the performance of the learning-based scheme approaches that derived from the exhaustive search when making a decision at each time-slot, and is superior to the exhaustive search method when making a decision at every few time-slots.

下载PDF全文

下载文献需遵守相关版权规定

论文标题