可压缩湍流的GPU加速DNS

论文标题

可压缩湍流的GPU加速DNS

GPU-Accelerated DNS of Compressible Turbulent Flows

论文作者

Kim, Youngdae, Ghosh, Debojyoti, Constantinescu, Emil M., Balakrishnan, Ramesh

论文摘要

本文探讨了转换现有的基于CPU的高性能计算流体动力学求解器HADAR的策略，以在新兴的Exascale异质（CPU+GPU）计算平台上进行可压缩流量模拟。开发GPU增强版本的HADAR的科学动机是在此类平台上以最高分辨率模拟规范的湍流。我们表明，与CPU核心相比，优化内存操作和线程块会导致计算密集型内核的200倍加速。使用多个GPU和CUDA感知MPI通信，我们证明了基于GPU的HADAR实现在NVIDIA VOLTA V100 GPU上。我们模拟了在三个元素周期盒中的均质各向同性湍流的衰减，该盒子上的网格上有高达$ 1024^3 $点（53亿自由度）和高达1,024 GPU的腐烂。我们比较了仅CPU的壁时间和CPU+GPU模拟。本文中介绍的结果分别在Oak Ridge和Lawrence Livermore国家实验室的Summit和Lassen超级计算机上获得。

This paper explores strategies to transform an existing CPU-based high-performance computational fluid dynamics solver, HyPar, for compressible flow simulations on emerging exascale heterogeneous (CPU+GPU) computing platforms. The scientific motivation for developing a GPU-enhanced version of HyPar is to simulate canonical turbulent flows at the highest resolution possible on such platforms. We show that optimizing memory operations and thread blocks results in 200x speedup of computationally intensive kernels compared with a CPU core. Using multiple GPUs and CUDA-aware MPI communication, we demonstrate both strong and weak scaling of our GPU-based HyPar implementation on the NVIDIA Volta V100 GPUs. We simulate the decay of homogeneous isotropic turbulence in a triply periodic box on grids with up to $1024^3$ points (5.3 billion degrees of freedom) and on up to 1,024 GPUs. We compare the wall times for CPU-only and CPU+GPU simulations. The results presented in the paper are obtained on the Summit and Lassen supercomputers at Oak Ridge and Lawrence Livermore National Laboratories, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题