论文标题
对于可压缩流的三维多分辨率模拟的模块化大规模平行计算环境
A modular massively parallel computing environment for three-dimensional multiresolution simulations of compressible flows
论文作者
论文摘要
可压缩流的数值研究面临两个主要挑战。为了准确描述流动特性,需要高分辨率的非线性数值方案来捕获不连续性并解决广泛的对流,声学和界面尺度范围。基于加权非线性重建方案的近似riemann求解器,对最先进的FVM进行现实3D问题的模拟需要使用HPC体系结构。有效的压缩算法减少了计算和内存负载。具有LTS的完全自适应MR算法证明了它们在此类应用中的潜力。虽然现代CPU需要多个平行性才能达到峰值性能,但细粒度的MR MESH适应性会导致具有挑战性的计算/通信模式。此外,LTS为挑战并行化策略的强大数据依赖性产生。 我们通过基于块的MR算法解决了这些挑战,在该算法中,基础OCTREE是可能的。这允许通过MPI在分布式内存机上进行并行化。我们通过修改后的Morton顺序以简单的比特逻辑来获得邻居关系。基于块的概念允许模块化源代码框架的模块化设置,在该算法中,算法的构建块(例如Riemann求解器的选择或重建模板的选择)可以互换而不会损失并行性能。我们使用$ \ Mathcal {O}(O}(10^3)$内核,我们介绍了模块化框架的功能,并具有超过10亿个单元格的有效分析。
Numerical investigation of compressible flows faces two main challenges. In order to accurately describe the flow characteristics, high-resolution nonlinear numerical schemes are needed to capture discontinuities and resolve wide convective, acoustic and interfacial scale ranges. The simulation of realistic 3D problems with state-of-the-art FVM based on approximate Riemann solvers with weighted nonlinear reconstruction schemes requires the usage of HPC architectures. Efficient compression algorithms reduce computational and memory load. Fully adaptive MR algorithms with LTS have proven their potential for such applications. While modern CPU require multiple levels of parallelism to achieve peak performance, the fine grained MR mesh adaptivity results in challenging compute/communication patterns. Moreover, LTS incur for strong data dependencies which challenge a parallelization strategy. We address these challenges with a block-based MR algorithm, where arbitrary cuts in the underlying octree are possible. This allows for a parallelization on distributed-memory machines via the MPI. We obtain neighbor relations by a simple bit-logic in a modified Morton Order. The block-based concept allows for a modular setup of the source code framework in which the building blocks of the algorithm, such as the choice of the Riemann solver or the reconstruction stencil, are interchangeable without loss of parallel performance. We present the capabilities of the modular framework with a range of test cases and scaling analysis with effective resolutions beyond one billion cells using $\mathcal{O}(10^3)$ cores.