神经PIM：有效的记忆处理，神经近似

论文标题

神经PIM：有效的记忆处理，神经近似

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

论文作者

Cao, Weidong, Zhao, Yilong, Boloor, Adith, Han, Yinhe, Zhang, Xuan, Jiang, Li

论文摘要

内存处理（PIM）体系结构在加速众多深度学习任务方面具有巨大的潜力。特别是，电阻随机访问存储器（RRAM）设备提供了有前途的硬件基材来构建PIM加速器，因为它们能够实现有效的原位矢量 - 矢量 - 矩阵乘法（VMMS）。但是，现有的PIM加速器遭受了频繁和能源密集型类似物（A/D）的转换，严重限制了其性能。本文提出了一种新的PIM架构，通过将所需的A/D转换最小化，以模拟积累和神经近似的外围电路来有效地加速深度学习任务。我们首先表征了现有PIM加速器所采用的不同数据流，该数据流提出了新的数据流，以显着通过将移位并添加（S+A）操作扩展到最终量化之前，从而大大减少VMM所需的A/D转换。然后，我们利用一种神经近似方法来设计模拟积累电路（S+A）和具有高效方式的RRAM横杆阵列的量化电路（ADC）。最后，我们将它们应用于构建基于RRAM的PIM加速器（即\ textbf {neural-pim}），并在提出的模拟数据流上评估其系统级别的性能。对不同基准测试的评估表明，与基于最先进的RRAM的PIM加速器相比，神经PIM可以提高5.36倍（1.73倍）并提高3.43倍（1.59倍）而不会损失准确性的吞吐量，即速度的吞吐量，即ISAAC（Cascade）。

Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM accelerators due to their abilities to realize efficient in-situ vector-matrix multiplications (VMMs). However, existing PIM accelerators suffer from frequent and energy-intensive analog-to-digital (A/D) conversions, severely limiting their performance. This paper presents a new PIM architecture to efficiently accelerate deep learning tasks by minimizing the required A/D conversions with analog accumulation and neural approximated peripheral circuits. We first characterize the different dataflows employed by existing PIM accelerators, based on which a new dataflow is proposed to remarkably reduce the required A/D conversions for VMMs by extending shift and add (S+A) operations into the analog domain before the final quantizations. We then leverage a neural approximation method to design both analog accumulation circuits (S+A) and quantization circuits (ADCs) with RRAM crossbar arrays in a highly-efficient manner. Finally, we apply them to build an RRAM-based PIM accelerator (i.e., \textbf{Neural-PIM}) upon the proposed analog dataflow and evaluate its system-level performance. Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy, compared to the state-of-the-art RRAM-based PIM accelerators, i.e., ISAAC (CASCADE).

下载PDF全文

下载文献需遵守相关版权规定

论文标题