SREC：通过深入的强化学习，积极主动的基于能源约束的无人机网络

论文标题

SREC：通过深入的强化学习，积极主动的基于能源约束的无人机网络

SREC: Proactive Self-Remedy of Energy-Constrained UAV-Based Networks via Deep Reinforcement Learning

论文作者

Zhang, Ran, Wang, Miao, Cai, Lin X.

论文摘要

多个无人机（UAV）的能源感受控制是基于无人机网络的主要研究兴趣之一。然而，现有的作品很少关注网络在更改UAV阵容时应如何应对时间的反应。在这项工作中，当一个或多个无人机不足并即将退出充电时，我们研究了能源受到的无人机网络的积极主动自我造型。我们以能源感知的最佳无人机控制策略为目标，该政策在任何无人机即将退出网络时主动地重新定位了无人机，而不是在退出后被动地派遣其余的无人机。具体而言，提出了一个名为SREC-DRL的深入增强学习（DRL）的自我补救方法，以最大程度地提高一定时期的累积用户满意度分数，其中至少一个无人机将退出网络。为了处理问题中的连续状态和行动空间，应用了critic-Critic DRL的最新算法，即深层确定性策略梯度（DDPG），以更好的收敛稳定性应用。数值结果表明，与被动反应方法相比，提出的SREC-DRL方法显示了补救期间累计用户满意度得分的$ 12.12 \％$。

Energy-aware control for multiple unmanned aerial vehicles (UAVs) is one of the major research interests in UAV based networking. Yet few existing works have focused on how the network should react around the timing when the UAV lineup is changed. In this work, we study proactive self-remedy of energy-constrained UAV networks when one or more UAVs are short of energy and about to quit for charging. We target at an energy-aware optimal UAV control policy which proactively relocates the UAVs when any UAV is about to quit the network, rather than passively dispatches the remaining UAVs after the quit. Specifically, a deep reinforcement learning (DRL)-based self remedy approach, named SREC-DRL, is proposed to maximize the accumulated user satisfaction scores for a certain period within which at least one UAV will quit the network. To handle the continuous state and action space in the problem, the state-of-the-art algorithm of the actor-critic DRL, i.e., deep deterministic policy gradient (DDPG), is applied with better convergence stability. Numerical results demonstrate that compared with the passive reaction method, the proposed SREC-DRL approach shows a $12.12\%$ gain in accumulative user satisfaction score during the remedy period.

下载PDF全文

下载文献需遵守相关版权规定

论文标题