使用TVM在CUDA上优化块 - sparse矩阵乘法

论文标题

使用TVM在CUDA上优化块 - sparse矩阵乘法

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

论文作者

Gu, Zijing

论文摘要

我们在CUDA上的密集矩阵和块 - sparse矩阵之间实现并优化了矩阵乘积。我们利用深度学习编译器TVM来探索操作的时间表并生成有效的CUDA代码。随着TVM中的自动参数调整，与其他最先进的框架相比，我们基于跨线程的实现实现了竞争或更好的性能。

We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题