论文标题
关于共享内存HPC中无等待协调算法的相关性:全局虚拟时间案例
On the Relevance of Wait-free Coordination Algorithms in Shared-Memory HPC:The Global Virtual Time Case
论文作者
论文摘要
由于协调算法而导致的共享内存/多核架构上的高性能计算可能会遭受不可忽略的性能瓶颈,但是,这些算法仍然需要确保整体正确性和/或支持家政操作的执行,例如恢复计算资源(例如,内存)。尽管在设计/开发方面更为复杂,但范式从经典协调算法转变为候补的范围可以显着提高HPC应用程序的性能。 在本文中,我们通过关注并行离散事件模拟的背景来探讨这种范式转变在共享内存架构中的相关性,其中全局虚拟时间(GVT)代表基本的协调算法。它允许计算所有参与并行/分布式计算的实体通过的逻辑时间的值。因此,它可以用来区分哪些事件属于计算的过去历史 - 因此被视为承诺 - 允许记忆恢复(例如,为了支持状态可恢复性而进行的过时日志)和不可撤销的操作(例如,I/O)。 我们比较了共享内存的参考(阻止)算法,该算法是由Fujimoto和Hybinette \ Cite \ Cite {Fuj97}提出的,并进行了无创新的候补实现,强调必须做出哪些设计选择来强制执行此范式转移,以及撤消cormitions corortions corordions corordions corordions corordions corordions coordions coordions coordions alg alg derg alg derg ang derg ang ang ng ang derg。
High-performance computing on shared-memory/multi-core architectures could suffer from non-negligible performance bottlenecks due to coordination algorithms, which are nevertheless necessary to ensure the overall correctness and/or to support the execution of housekeeping operations, e.g. to recover computing resources (e.g., memory). Although more complex in design/development, a paradigm switch from classical coordination algorithms to wait-free ones could significantly boost the performance of HPC applications. In this paper we explore the relevance of this paradigm shift in shared-memory architectures, by focusing on the context of Parallel Discrete Event Simulation, where the Global Virtual Time (GVT) represents a fundamental coordination algorithm. It allows to compute the lower bound on the value of the logical time passed through by all the entities participating in a parallel/distributed computation. Hence it can be used to discriminate what events belong to the past history of the computation---thus being considered as committed---and allowing for memory recovery (e.g. of obsolete logs that were taken in order to support state recoverability) and non-revokable operations (e.g. I/O). We compare the reference (blocking) algorithm for shared memory, the one proposed by by Fujimoto and Hybinette \cite{Fuj97}, with an innovative wait-free implementation, emphasizing on what design choices must be made to enforce this paradigm shift, and what are the performance implications of removing critical sections in coordination algorithms.