学会在与变压器的共享内存环境中并行化

论文标题

学会在与变压器的共享内存环境中并行化

Learning to Parallelize in a Shared-Memory Environment with Transformers

论文作者

Harel, Re'em, Pinter, Yuval, Oren, Gal

论文摘要

在过去的几年中，世界已改用多核和多核共享内存体系结构。结果，越来越需要通过将共享内存并行化方案引入软件应用程序来利用这些体系结构。 OpenMP是实现此类方案的最全面的API，其特征是可读接口。然而，由于平行共享内存的管理中普遍存在的陷阱，将OpenMP引入代码很具有挑战性。为了促进此任务的执行，多年来创建了许多源代码（S2S）编译器，任务是将OpenMP指令自动插入代码。除了对输入格式的鲁棒性有限外，这些编译器仍然无法在找到可行的代码和生成适当指令时获得令人满意的覆盖范围和精确度。在这项工作中，我们建议利用ML技术的最新进展，特别是自然语言处理（NLP），以完全替换S2S编译器。我们创建一个数据库（语料库），专门用于此目标。 Open-Opm包含28,000多个代码片段，其中一半包含OpenMP指令，而另一半根本不需要并行化。我们使用语料库来训练系统来自动对需要并行化的代码段进行分类，并建议单个OpenMP条款。我们为这些任务培训了几个名为Bragformer的变压器模型，并表明它们的表现优于统计训练的基准和自动S2S并行化编译器，这既可以分类OpenMP指令的总体需求，又要介绍私人和还原条款。我们的源代码和数据库可在以下网址获得：https：//github.com/scientific-computing-lab-nrcn/pragformer。

In past years, the world has switched to many-core and multi-core shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes to software applications. OpenMP is the most comprehensive API that implements such schemes, characterized by a readable interface. Nevertheless, introducing OpenMP into code is challenging due to pervasive pitfalls in management of parallel shared memory. To facilitate the performance of this task, many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. In this work, we propose leveraging recent advances in ML techniques, specifically in natural language processing (NLP), to replace S2S compilers altogether. We create a database (corpus), Open-OMP, specifically for this goal. Open-OMP contains over 28,000 code snippets, half of which contain OpenMP directives while the other half do not need parallelization at all with high probability. We use the corpus to train systems to automatically classify code segments in need of parallelization, as well as suggest individual OpenMP clauses. We train several transformer models, named PragFormer, for these tasks, and show that they outperform statistically-trained baselines and automatic S2S parallelization compilers in both classifying the overall need for an OpenMP directive and the introduction of private and reduction clauses. Our source code and database are available at: https://github.com/Scientific-Computing-Lab-NRCN/PragFormer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题