Tensorfhe：使用GPGPU在加密数据上实现实用计算

论文标题

Tensorfhe：使用GPGPU在加密数据上实现实用计算

TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU

论文作者

Fan, Shengyu, Wang, Zhiwei, Xu, Weizhi, Hou, Rui, Meng, Dan, Zhang, Mingzhe

论文摘要

在本文中，我们提出了Tensorfhe，这是基于GPGPU的FHE加速度解决方案，用于用于加密数据的实际应用。 Tensorfhe利用张量核心单元（TCU）来增强数字理论变换（NTT）的计算，这是具有最高时间成本的FHE的一部分。此外，Tensorfhe专注于在一定时间段内执行尽可能多的FHE操作，而不是减少一个操作的延迟。基于这样的想法，Tensorfhe引入了操作级别的批处理，以充分利用GPGPU中的数据并行性。我们从实验上证明，与最先进的ASIC加速器相比，可以与GPGPU实现可比的性能。 Tensorfhe在NTT和Hmult（键FHE内核）NVIDIA A100 GPGPU中执行913 kops和88 kops，比GPGPU上的最先进的FHE实现快2.61倍。此外，Tensorfhe提供了与ASIC FHE加速器相当的性能，这使得在特定工作负载的情况下，其比F1+快2.9倍。这种基于具有高性能的商业硬件的纯软件加速器可以为真实系统中的一系列应用程序打开最先进的算法的使用。

In this paper, we propose TensorFHE, an FHE acceleration solution based on GPGPU for real applications on encrypted data. TensorFHE utilizes Tensor Core Units (TCUs) to boost the computation of Number Theoretic Transform (NTT), which is the part of FHE with highest time-cost. Moreover, TensorFHE focuses on performing as many FHE operations as possible in a certain time period rather than reducing the latency of one operation. Based on such an idea, TensorFHE introduces operation-level batching to fully utilize the data parallelism in GPGPU. We experimentally prove that it is possible to achieve comparable performance with GPGPU as with state-of-the-art ASIC accelerators. TensorFHE performs 913 KOPS and 88 KOPS for NTT and HMULT (key FHE kernels) within NVIDIA A100 GPGPU, which is 2.61x faster than state-of-the-art FHE implementation on GPGPU; Moreover, TensorFHE provides comparable performance to the ASIC FHE accelerators, which makes it even 2.9x faster than the F1+ with a specific workload. Such a pure software acceleration based on commercial hardware with high performance can open up usage of state-of-the-art FHE algorithms for a broad set of applications in real systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题