GPU上的GIGA尺度内核矩阵向量乘法

论文标题

GPU上的GIGA尺度内核矩阵向量乘法

Giga-scale Kernel Matrix Vector Multiplication on GPU

论文作者

Hu, Robert, Chau, Siu Lun, Sejdinovic, Dino, Glaunès, Joan Alexis

论文摘要

内核矩阵矢量乘法（KMVM）是机器学习和科学计算中的基础操作。但是，由于KMVM倾向于在内存和时间上二次扩展，因此应用程序通常受这些计算约束的限制。在本文中，我们提出了一个新颖的近似过程，即\ textit {fastiT {faster-fast and free Memory方法}（$ \ fthreem $），以解决高〜（$ 10^8 \ sim 10^9 $）和Skinny〜（$ d \ d \ d \ leq7 $）数据的KMVM的这些扩展问题。广泛的实验表明，$ \ fthreem $具有经验\ emph {线性时间和内存}的复杂性，相对误差$ 10^{ - 3} $，并且可以在高端GPU上计算一个数十亿分的\ emph {在一个分钟内的高端GPU，以相比的速度相比，与现有的cpus相比，可以在一个高端的GPU上进行大量加速。我们通过将其应用于最先进的基于GPU的线性求解器Falkon，\ emph {提高速度1.5-5.5倍}以$ <1 \％\％$的准确性下降来证明我们的程序的实用性。我们进一步在\ emph {Gaussian流程回归}上展示了竞争结果，并在各种现实世界中的数据集中进行了大幅加速。

Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} ($\fthreem$) to address these scaling issues of KMVM for tall~($10^8\sim 10^9$) and skinny~($D\leq7$) data. Extensive experiments demonstrate that $\fthreem$ has empirical \emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, \emph{improving speed 1.5-5.5 times} at the cost of $<1\%$ drop in accuracy. We further demonstrate competitive results on \emph{Gaussian Process regression} coupled with significant speedups on a variety of real-world datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题