论文标题
异构边缘云计算中的科学工作流程:基于强化学习的数据放置策略
Scientific Workflows in Heterogeneous Edge-Cloud Computing: A Data Placement Strategy Based on Reinforcement learning
论文作者
论文摘要
与云计算或其他传统的分布式计算环境相比,异质的边缘云计算范式可以为部署科学工作流提供最佳的解决方案。由于科学数据集的不同尺寸以及有关其中一些数据集的隐私问题,因此必须找到可以最大程度地减少数据传输时间的数据放置策略。一些最先进的数据放置策略将边缘计算和云计算结合在一起,以分发科学数据集。但是,在工作流执行期间,新生成的数据集的动态分布仍然是一个挑战。为了应对这一挑战,本研究不仅构建了一个数据放置模型,该模型包括各个地理区域内和多个工作流程中的共享数据集,而且还提出了基于两个阶段算法的数据放置策略(DYM-RL-DPS)。首先,在工作流程的构建时间阶段,我们使用具有差分进化的离散粒子群优化算法将初始数据集预先分配到适当的数据中心。然后,我们将动态数据集分布问题重新制定为马尔可夫决策过程,并提供了一种基于强化的学习方法,以在科学工作流程的运行时阶段学习最佳策略。通过模拟异质边缘云计算环境,我们设计了全面的实验,以证明DYM-RL-DPS的优越性。与其他策略相比,我们策略的结果可以有效地减少数据传输时间。
The heterogeneous edge-cloud computing paradigm can provide an optimal solution to deploy scientific workflows compared to cloud computing or other traditional distributed computing environments. Owing to the different sizes of scientific datasets and the privacy issue concerning some of these datasets, it is essential to find a data placement strategy that can minimize data transmission time. Some state-of-the-art data placement strategies combine edge computing and cloud computing to distribute scientific datasets. However, the dynamic distribution of newly generated datasets to appropriate datacenters and exiting the spent datasets are still a challenge during workflows execution. To address this challenge, this study not only constructs a data placement model that includes shared datasets within individual and among multiple workflows across various geographical regions, but also proposes a data placement strategy (DYM-RL-DPS) based on algorithms of two stages. First, during the build-time stage of workflows, we use the discrete particle swarm optimization algorithm with differential evolution to pre-allocate initial datasets to proper datacenters. Then, we reformulate the dynamic datasets distribution problem as a Markov decision process and provide a reinforcement learning-based approach to learn the optimal strategy in the runtime stage of scientific workflows. Through simulating heterogeneous edge-cloud computing environments, we designed comprehensive experiments to demonstrate the superiority of DYM-RL-DPS. The results of our strategy can effectively reduce the data transmission time as compared to other strategies.