GPU上的内核操作，带有AutoDiff，没有内存溢出

论文标题

GPU上的内核操作，带有AutoDiff，没有内存溢出

Kernel Operations on the GPU, with Autodiff, without Memory Overflows

论文作者

Charlier, Benjamin, Feydy, Jean, Glaunès, Joan Alexis, Collin, François-David, Durif, Ghislain

论文摘要

KEOPS库为张量提供了快速，记忆有效的GPU支持，其条目由数学公式（例如内核和距离矩阵）提供。 KEOPS减轻了用于内核和几何应用的以张量为中心库的主要瓶颈：内存消耗。它还支持自动差异化，并且胜过标准的GPU基准，包括Pytorch CUDA张量或卤化物和TVM库。 Keops将优化的C ++/CUDA方案与高级语言的粘合剂结合在一起：Python（Numpy和Pytorch），Matlab和GnuR。结果，高级“二次码头”代码现在可以扩展到大型数据集，并在数百万个秒内处理大型数据集。 Keops为内核方法带来了类似图形的性能，并且可以在标准存储库（PYPI，CRAN）上免费获得。为了展示其多功能性，我们在\ url {www.kernel-operations.io}在各种设置中提供教程。

The KeOps library provides a fast and memory-efficient GPU support for tensors whose entries are given by a mathematical formula, such as kernel and distance matrices. KeOps alleviates the major bottleneck of tensor-centric libraries for kernel and geometric applications: memory consumption. It also supports automatic differentiation and outperforms standard GPU baselines, including PyTorch CUDA tensors or the Halide and TVM libraries. KeOps combines optimized C++/CUDA schemes with binders for high-level languages: Python (Numpy and PyTorch), Matlab and GNU R. As a result, high-level "quadratic" codes can now scale up to large data sets with millions of samples processed in seconds. KeOps brings graphics-like performances for kernel methods and is freely available on standard repositories (PyPi, CRAN). To showcase its versatility, we provide tutorials in a wide range of settings online at \url{www.kernel-operations.io}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题