Anyseq：基于部分评估的高性能序列对齐库

论文标题

Anyseq：基于部分评估的高性能序列对齐库

AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation

论文作者

Müller, André, Schmidt, Bertil, Hildebrandt, Andreas, Membarth, Richard, Leißa, Roland, Kruse, Matthis, Hack, Sebastian

论文摘要

序列比对是生物信息学的基础，这导致了各种优化的实现。不幸的是，其中绝大多数是手工调整的，并且针对某些架构和执行模型。这不仅使他们具有挑战性地理解和扩展，而且很难移植到其他平台。我们提出AnySeq-一个新的库，用于计算DNA序列的不同类型的成对比对。我们的方法将高性能与直觉上可理解的实现相结合，这是通过部分评估的概念实现的。使用AnyDSL编译器框架，AnySeq启用了算法变体的汇编，这些变体对具有单个统一代码库的特定用法方案和硬件目标进行了高度优化。因此，所得的域特异性库允许通过简单函数组成而不是通常难以理解的对齐参数（例如对齐参数，例如对齐类型，评分方案和追溯vs.〜平原分数）的变化。我们的实施支持CPU，启用CUDA的GPU和FPGA上的多线程和SIMD矢量化。 Anyseq最多慢7％，在许多情况下（最多12％）比在CPU（SEQAN）和GPU（NVBIO）上手动优化的对齐库要快（最多12％）。

Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not only makes them challenging to understand and extend, but also difficult to port to other platforms. We present AnySeq - a novel library for computing different types of pairwise alignments of DNA sequences. Our approach combines high performance with an intuitively understandable implementation, which is achieved through the concept of partial evaluation. Using the AnyDSL compiler framework, AnySeq enables the compilation of algorithmic variants that are highly optimized for specific usage scenarios and hardware targets with a single, uniform codebase. The resulting domain-specific library thus allows the variation of alignment parameters (such as alignment type, scoring scheme, and traceback vs.~plain score) by simple function composition rather than metaprogramming techniques which are often hard to understand. Our implementation supports multithreading and SIMD vectorization on CPUs, CUDA-enabled GPUs, and FPGAs. AnySeq is at most 7% slower and in many cases faster (up to 12%) than state-of-the art manually optimized alignment libraries on CPUs (SeqAn) and on GPUs (NVBio).

下载PDF全文

下载文献需遵守相关版权规定

论文标题