论文标题

部分可观测时空混沌系统的无模型预测

Performance optimization and analysis of the unstructured Discontinuous Galerkin solver on multi-core and many-core architectures

论文作者

Dai, Zhe, D, Liang, Wang, Yueqin, Wang, Fang, Ming, Li, Zhang, Jian

论文摘要

不连续的Galerkin(DG)算法是计算流体动力学(CFD)领域的代表性高阶方法,具有相当大的数学优势,例如高分辨率,低消散和分散。但是,DG在计算上是相当密集的,可以证明实用的工程问题。本文讨论了我们内部实用DG应用程序在三种不同的编程模型中的实施,以及一些优化技术,包括网格恢复和混合精度,以最大程度地提高单个节点系统的性能改进。 CPU和GPU上的实验表明,与原始应用程序的串行执行相比,我们的CUDA,OpenACC和OpenMP代码获得的最大速度为42.9倍,35.3倍和8.1倍。此外,我们从两个方面进行系统地比较编程模型:绩效和生产力。我们的经验结论促进了程序员根据目标应用程序选择合适的编程模型的正确平台。

The discontinuous Galerkin (DG) algorithm is a representative high order method in Computational Fluid Dynamics (CFD) area which possesses considerable mathematical advantages such as high resolution, low dissipation, and dispersion. However, DG is rather computationally intensive to demonstrate practical engineering problems. This paper discusses the implementation of our in-house practical DG application in three different programming models, as well as some optimization techniques, including grid renumbering and mixed precision to maximize the performance improvements in a single node system. The experiment on CPU and GPU shows that our CUDA, OpenACC, and OpenMP-based code obtains a maximum speedup of 42.9x, 35.3x, and 8.1x compared with serial execution by the original application, respectively. Besides, we systematically compare the programming models in two aspects: performance and productivity. Our empirical conclusions facilitate the programmers to select the right platform with a suitable programming model according to their target applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源