联盟的多代理参与者 - 敏感移动边缘计算

论文标题

联盟的多代理参与者 - 敏感移动边缘计算

Federated Multi-Agent Actor-Critic Learning for Age Sensitive Mobile Edge Computing

论文作者

Zhu, Zheqi, Wan, Shuo, Fan, Pingyi, Letaief, Khaled B.

论文摘要

作为一种新兴技术，移动边缘计算（MEC）为各种分布式通信计算系统（例如工业互联网（IoT），车辆通信，智能城市等）引入了一种新的处理方案。在这项工作中，我们主要集中于MEC系统的及时性，其中数据和计算任务的新鲜感和计算任务是重要的。首先，我们制定了一种对年龄敏感的MEC模型，并定义了兴趣的平均信息年龄（AOI）最小化问题。然后，提出了一个新型的基于政策的多代理深入强化学习（RL）框架，称为异质性多代理演员评论家（H-MAAC），被认为是在调查的MEC系统中共同合作的范式，其中边缘设备和中心控制器通过自己的观察值学习交互式策略。为了改善系统性能，我们通过将边缘联合学习模式引入多代理合作来开发相应的在线算法，从理论上可以保证其在学习融合方面的优势。据我们所知，这是第一个联合MEC协作算法将联合模式与多代理参与者 - 批判性强化学习相结合的。此外，我们评估了提出的方法，并将其与经典的基于RL的方法进行比较。结果，所提出的框架不仅超过了平均系统年龄的基线，而且还促进了训练过程的稳定性。此外，模拟结果为在Federated Collaboration的边缘设计下为系统设计提供了一些创新的观点。

As an emerging technique, mobile edge computing (MEC) introduces a new processing scheme for various distributed communication-computing systems such as industrial Internet of Things (IoT), vehicular communication, smart city, etc. In this work, we mainly focus on the timeliness of the MEC systems where the freshness of the data and computation tasks is significant. Firstly, we formulate a kind of age-sensitive MEC models and define the average age of information (AoI) minimization problems of interests. Then, a novel policy based multi-agent deep reinforcement learning (RL) framework, called heterogeneous multi-agent actor critic (H-MAAC), is proposed as a paradigm for joint collaboration in the investigated MEC systems, where edge devices and center controller learn the interactive strategies through their own observations. To improves the system performance, we develop the corresponding online algorithm by introducing an edge federated learning mode into the multi-agent cooperation whose advantages on learning convergence can be guaranteed theoretically. To the best of our knowledge, it's the first joint MEC collaboration algorithm that combines the edge federated mode with the multi-agent actor-critic reinforcement learning. Furthermore, we evaluate the proposed approach and compare it with classical RL based methods. As a result, the proposed framework not only outperforms the baseline on average system age, but also promotes the stability of training process. Besides, the simulation results provide some innovative perspectives for the system design under the edge federated collaboration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题