可扩展的共享内存超图形分区

论文标题

可扩展的共享内存超图形分区

Scalable Shared-Memory Hypergraph Partitioning

论文作者

Gottesbüren, Lars, Heuer, Tobias, Sanders, Peter, Schlag, Sebastian

论文摘要

HyperGraph分区是一个重要的预处理步骤，用于优化数据放置并最大程度地减少高性能计算应用程序中的通信量。为了应对不断增长的问题大小，开发快速平行分配算法的趋势越来越重要，其解决方案质量与现有的顺序算法具有竞争力。 To this end, we present Mt-KaHyPar, the first shared-memory multilevel hypergraph partitioner with parallel implementations of many techniques used by the sequential, high-quality partitioning systems: a parallel coarsening algorithm that uses parallel community detection as guidance, initial partitioning via parallel recursive bipartitioning with work-stealing, a scalable label propagation refinement algorithm, and the first经典FM算法的完全并行直接$ k $ - 道。从各个应用域中进行的大型基准实例进行的实验证明了我们方法的可扩展性和有效性。使用64个核心，我们观察到高达51的自相机加速，谐波平均速度为23.5。在解决方案质量方面，我们在95％的实例上胜过分布式超图形分区者Zoltan，同时也要快2.1倍。 MT-KAHYPAR只有四个核心，也比最快的顺序多级分区器PATOH稍快，同时在所有实例中产生了更好的解决方案。顺序高质量的分区仪Kahypar仍然发现了比我们的平行方法更好的解决方案，尤其是在使用基于最大流量的改进时。但是，这是以较长的运行时间为代价的。

Hypergraph partitioning is an important preprocessing step for optimizing data placement and minimizing communication volumes in high-performance computing applications. To cope with ever growing problem sizes, it has become increasingly important to develop fast parallel partitioning algorithms whose solution quality is competitive with existing sequential algorithms. To this end, we present Mt-KaHyPar, the first shared-memory multilevel hypergraph partitioner with parallel implementations of many techniques used by the sequential, high-quality partitioning systems: a parallel coarsening algorithm that uses parallel community detection as guidance, initial partitioning via parallel recursive bipartitioning with work-stealing, a scalable label propagation refinement algorithm, and the first fully-parallel direct $k$-way formulation of the classical FM algorithm. Experiments performed on a large benchmark set of instances from various application domains demonstrate the scalability and effectiveness of our approach. With 64 cores, we observe self-relative speedups of up to 51 and a harmonic mean speedup of 23.5. In terms of solution quality, we outperform the distributed hypergraph partitioner Zoltan on 95% of the instances while also being a factor of 2.1 faster. With just four cores,Mt-KaHyPar is also slightly faster than the fastest sequential multilevel partitioner PaToH while producing better solutions on 83% of all instances. The sequential high-quality partitioner KaHyPar still finds better solutions than our parallel approach, especially when using max-flow-based refinement. This, however, comes at the cost of considerably longer running times.

下载PDF全文

下载文献需遵守相关版权规定

论文标题