对能源可持续多UAV基于NOMA的可持续多UAV的随机访问IoT网络的约束深度强化学习

论文标题

对能源可持续多UAV基于NOMA的可持续多UAV的随机访问IoT网络的约束深度强化学习

Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV based Random Access IoT Networks with NOMA

论文作者

Khairy, Sami, Balaprakash, Prasanna, Cai, Lin X., Cheng, Yu

论文摘要

在本文中，我们应用了非正交的多重访问（NOMA）技术来改善无线物联网网络的大规模通道访问，该网络是太阳能无人驾驶飞机（UAVS）中继数据从物联网设备到远程服务器。具体而言，IoT设备争夺使用自适应$ P $ pumpersent的aloha协议访问共享无线通道的竞争；太阳能无人机采用连续的干扰取消（SIC）来解码从IoT设备中的多个接收到的数据，以提高访问效率。为了启用能量能力最佳的网络，我们研究了动态多动力学高度控制和物联网设备的多电池无线通道访问管理的联合问题，这是具有多个能量约束的随机控制问题。为了学习最佳的控制策略，我们首先将此问题作为约束的马尔可夫决策过程（CMDP）提出，并提出了基于Lagrangian Primal Dual dual Policizatization优化的无线模型约束深度强化学习（CDRL）算法，以求解CMDP。广泛的模拟表明，我们提出的算法了解了无人机之间的合作政策，在该算法中，无人机和渠道访问物联网设备的访问概率是动态而共同控制的，以实现最大的长期网络能力，同时维持无人机的能源可持续性。拟议的算法优于基于RL的深度解决方案，具有奖励成型以考虑能源成本，并且与基于可行的DRL的解决方案的时间平均系统容量高出82.4 \％\％$ $，而与能量符合的无符合限制系统相比，仅$ 6.47 \％$ $ $ $ $。

In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique to improve the massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers. Specifically, IoT devices contend for accessing the shared wireless channel using an adaptive $p$-persistent slotted Aloha protocol; and the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to decode multiple received data from IoT devices to improve access efficiency. To enable an energy-sustainable capacity-optimal network, we study the joint problem of dynamic multi-UAV altitude control and multi-cell wireless channel access management of IoT devices as a stochastic control problem with multiple energy constraints. To learn an optimal control policy, we first formulate this problem as a Constrained Markov Decision Process (CMDP), and propose an online model-free Constrained Deep Reinforcement Learning (CDRL) algorithm based on Lagrangian primal-dual policy optimization to solve the CMDP. Extensive simulations demonstrate that our proposed algorithm learns a cooperative policy among UAVs in which the altitude of UAVs and channel access probability of IoT devices are dynamically and jointly controlled to attain the maximal long-term network capacity while maintaining energy sustainability of UAVs. The proposed algorithm outperforms Deep RL based solutions with reward shaping to account for energy costs, and achieves a temporal average system capacity which is $82.4\%$ higher than that of a feasible DRL based solution, and only $6.47\%$ lower compared to that of the energy-constraint-free system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题