论文标题
部分可观测时空混沌系统的无模型预测
Bring the BitCODE -- Moving Compute and Data in Distributed Heterogeneous Systems
论文作者
论文摘要
在本文中,我们提出了一个框架,用于在分布式异质系统中的处理元素之间移动计算和数据。该框架的实现基于LLVM编译器工具链与UCX通信框架相结合。该框架可以为多个CPU体系结构生成二进制计算机代码或LLVM比特码,并将代码移至远程计算机,同时在目标平台上动态优化和链接代码。远程注入的代码可以递归地传播到其他远程计算机或生成新代码。本文的目的是三倍:(a)介绍框架的体系结构和实施,该框架提供了基础架构,以编程新的分类系统,其中异质编程元素(计算节点和数据处理单元(DPU)(DPU)在整个系统中分布到JOMENGIA,(b)将框架分配到现代,较高的现代(b),诸如计算节点和数据处理单元(dpus)之类的范围,以现代为基础(b)演示并评估本框架启用的新的扩展远程直接内存访问(X-RDMA)通信操作。为了评估框架的功能,我们使用了一个带有富士通CPU的群集和与Intel CPU和Bluefield-2 DPU的异质群集,并使用高性能RDMA织物互连。我们展示了一个X-RDMA指针追逐应用程序,该应用程序的表现优于基于RDMA的实现的70%,并且与活动消息一样快,但不需要在远程平台上的功能预选。
In this paper, we present a framework for moving compute and data between processing elements in a distributed heterogeneous system. The implementation of the framework is based on the LLVM compiler toolchain combined with the UCX communication framework. The framework can generate binary machine code or LLVM bitcode for multiple CPU architectures and move the code to remote machines while dynamically optimizing and linking the code on the target platform. The remotely injected code can recursively propagate itself to other remote machines or generate new code. The goal of this paper is threefold: (a) to present an architecture and implementation of the framework that provides essential infrastructure to program a new class of disaggregated systems wherein heterogeneous programming elements such as compute nodes and data processing units (DPUs) are distributed across the system, (b) to demonstrate how the framework can be integrated with modern, high-level programming languages such as Julia, and (c) to demonstrate and evaluate a new class of eXtended Remote Direct Memory Access (X-RDMA) communication operations that are enabled by this framework. To evaluate the capabilities of the framework, we used a cluster with Fujitsu CPUs and heterogeneous cluster with Intel CPUs and BlueField-2 DPUs interconnected using high-performance RDMA fabric. We demonstrated an X-RDMA pointer chase application that outperforms an RDMA GET-based implementation by 70% and is as fast as Active Messages, but does not require function predeployment on remote platforms.