优化A64FX的稀疏多线程Cholesky分解

论文标题

优化A64FX的稀疏多线程Cholesky分解

Optimization of the Sparse Multi-Threaded Cholesky Factorization for A64FX

论文作者

Fèvre, Valentin Le, Usui, Tetsuzo, Casas, Marc

论文摘要

稀疏的线性代数例程是各种科学应用的基本构建基块。直接求解器是通过在许多情况下使用矩阵将矩阵分解为三角形矩阵产物来求解线性系统的方法。 Cholesky分解是对称和确定阳性矩阵的最快直接方法。本文提出了选择性嵌套，这是一种基于稀疏矩阵的结构来确定平行Cholesky分解的最佳任务粒度的方法。我们提出了OPT-D-COST算法，该算法会自动和动态地应用选择性嵌套。 OPT-D-COST利用矩阵稀疏性在直接解决器的背景下驱动基于任务的复杂并行工作负载。考虑到一组60个稀疏矩阵和带有A64FX处理器的平行机，我们开展了广泛的评估活动。相对于运行直接求解器的最佳最新并行方法，OPT-D-COST的平均性能加速为1.46 $ \ times $。

Sparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular matrices, are commonly used in many contexts. The Cholesky factorization is the fastest direct method for symmetric and definite positive matrices. This paper presents selective nesting, a method to determine the optimal task granularity for the parallel Cholesky factorization based on the structure of sparse matrices. We propose the OPT-D-COST algorithm, which automatically and dynamically applies selective nesting. OPT-D-COST leverages matrix sparsity to drive complex task-based parallel workloads in the context of direct solvers. We run an extensive evaluation campaign considering a heterogeneous set of 60 sparse matrices and a parallel machine featuring the A64FX processor. OPT-D-COST delivers an average performance speedup of 1.46$\times$ with respect to the best state-of-the-art parallel method to run direct solvers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题