论文标题

ARM 4位PQ:基于SIMD的加速度,用于大约最近的邻居搜索手臂

ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM

论文作者

Matsui, Yusuke, Imaizumi, Yoshiki, Miyamoto, Naoya, Yoshifuji, Naoki

论文摘要

我们在ARM架构上加速了4位产品量化(PQ)。值得注意的是,常规4位PQ的急剧性能强烈依赖于X64特异性SIMD寄存器,例如AVX2;因此,我们还无法在ARM上取得如此出色的表现。为了填补这一空白,我们首先将两个128位寄存器捆绑为一个256位组件。然后,我们使用ARM特定的霓虹灯指令为每个操作应用洗牌操作。通过进行这种简单但批判性的修改,我们为4位PQ在ARM架构上实现了巨大的加速。实验表明,所提出的方法始终以相同的精度比幼稚的PQ提高了10倍。

We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. Experiments show that the proposed method consistently achieves a 10x improvement over the naive PQ with the same accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源