FPIRM：赛道记忆中的浮点处理

论文标题

FPIRM：赛道记忆中的浮点处理

FPIRM: Floating-point Processing in Racetrack Memories

论文作者

Ollivier, Sébastien, Zhang, Xinyi, Tang, Yue, Choudhuri, Chayanika, Hu, Jingtong, Jones, Alex K.

论文摘要

卷积神经网络（CNN）已成为一种无处不在的算法，随着移动和边缘设置的应用不断增长。我们使用赛车记忆（RM）来描述一种称为FPIRM的计算机（CIM）技术，以加速CNN的边缘系统。使用横向读取，可以确定“ 1的多个相邻域的数量”的技术，FPIRM可以有效地实现多手术和加法计算，以及两手机和乘法。我们讨论FPIRM如何实现可变精度整数和浮点算术。这允许CNN推理和设备培训，而无需昂贵的数据转移到云。基于这些功能，我们证明了使用RM CIM进行背部传播的几个CNN的实现，并将其与CIM推理和训练的最新实现进行了比较。在培训期间，FPIRM的效率提高了2 $ \ times $，通过将能耗降低至少27％，并将吞吐量提高至少18％。

Convolutional neural networks (CNN) have become a ubiquitous algorithm with growing applications in mobile and edge settings. We describe a compute-in-memory (CIM) technique called FPIRM using Racetrack Memory (RM) to accelerate CNNs for edge systems. Using transverse read, a technique that can determine the number of '1's multiple adjacent domains, FPIRM can efficiently implement multi-operand bulk-bitwise and addition computations, and two-operand multiplication. We discuss how FPIRM can implement both variable precision integer and floating point arithmetic. This allows both CNN inference and on-device training without expensive data movement to the cloud. Based on these functions we demonstrate implementation of several CNNs with back propagation using RM CIM and compare these to state-of-the-art implementations of CIM inference and training in Field-Programmable Gate Arrays. During training FPIRM improves by 2$\times$ the efficiency, by reducing the energy consumption by at least 27% and increasing the throughput by at least 18% against FPGA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题