用于连接运输的生态角度边缘网络：分布式多机构增强学习方法

论文标题

用于连接运输的生态角度边缘网络：分布式多机构增强学习方法

Eco-Vehicular Edge Networks for Connected Transportation: A Distributed Multi-Agent Reinforcement Learning Approach

论文作者

Pervej, Md Ferdous, Lin, Shih-Chun

论文摘要

本文介绍了一个节能，软件定义的车辆边缘网络，用于不断增长的智能连接运输系统。研究了一个以用户为中心的虚拟细胞形成和资源分配问题，以使生态分解处于边缘。这个联合问题旨在对抗渴望渴望的边缘节点，同时保持可靠性和数据速率。更具体地说，通过优先考虑动态生态路线的下行链路通信，高度移动的自动驾驶汽车可以同时使用多个低功率接入点（AP），以实现无处不在的连接性和网络的可靠性。由于其复杂的组合结构，在多项式时间内解决了公式的优化非常麻烦。因此，提出了一种分布式的多代理增强学习（D-MARL）算法，用于生态时尚边缘，其中多个代理商合作学习获得最佳奖励。首先，该算法将集中式动作空间分为多个较小的组。基于无模型分布式Q学习者，每个边缘代理都从各个组中采取其动作。同样，在每个学习状态下，软件定义的控制器从分布式代理的各个最佳选择中选择全球最佳动作。数值结果验证了我们的学习解决方案与现有基线相比，在少量训练发作中实现了近乎最佳的表现。

This paper introduces an energy-efficient, software-defined vehicular edge network for the growing intelligent connected transportation system. A joint user-centric virtual cell formation and resource allocation problem is investigated to bring eco-solutions at the edge. This joint problem aims to combat against the power-hungry edge nodes while maintaining assured reliability and data rate. More specifically, by prioritizing the downlink communication of dynamic eco-routing, highly mobile autonomous vehicles are served with multiple low-powered access points (APs) simultaneously for ubiquitous connectivity and guaranteed reliability of the network. The formulated optimization is exceptionally troublesome to solve within a polynomial time, due to its complicated combinatorial structure. Hence, a distributed multi-agent reinforcement learning (D-MARL) algorithm is proposed for eco-vehicular edges, where multiple agents cooperatively learn to receive the best reward. First, the algorithm segments the centralized action space into multiple smaller groups. Based on the model-free distributed Q learner, each edge agent takes its actions from the respective group. Also, in each learning state, a software-defined controller chooses the global best action from individual bests of the distributed agents. Numerical results validate that our learning solution achieves near-optimal performances within a small number of training episodes as compared with existing baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题