论文标题
神经PIM:有效的记忆处理,神经近似
Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals
论文作者
论文摘要
内存处理(PIM)体系结构在加速众多深度学习任务方面具有巨大的潜力。特别是,电阻随机访问存储器(RRAM)设备提供了有前途的硬件基材来构建PIM加速器,因为它们能够实现有效的原位矢量 - 矢量 - 矩阵乘法(VMMS)。但是,现有的PIM加速器遭受了频繁和能源密集型类似物(A/D)的转换,严重限制了其性能。本文提出了一种新的PIM架构,通过将所需的A/D转换最小化,以模拟积累和神经近似的外围电路来有效地加速深度学习任务。我们首先表征了现有PIM加速器所采用的不同数据流,该数据流提出了新的数据流,以显着通过将移位并添加(S+A)操作扩展到最终量化之前,从而大大减少VMM所需的A/D转换。然后,我们利用一种神经近似方法来设计模拟积累电路(S+A)和具有高效方式的RRAM横杆阵列的量化电路(ADC)。最后,我们将它们应用于构建基于RRAM的PIM加速器(即\ textbf {neural-pim}),并在提出的模拟数据流上评估其系统级别的性能。对不同基准测试的评估表明,与基于最先进的RRAM的PIM加速器相比,神经PIM可以提高5.36倍(1.73倍)并提高3.43倍(1.59倍)而不会损失准确性的吞吐量,即速度的吞吐量,即ISAAC(Cascade)。
Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM accelerators due to their abilities to realize efficient in-situ vector-matrix multiplications (VMMs). However, existing PIM accelerators suffer from frequent and energy-intensive analog-to-digital (A/D) conversions, severely limiting their performance. This paper presents a new PIM architecture to efficiently accelerate deep learning tasks by minimizing the required A/D conversions with analog accumulation and neural approximated peripheral circuits. We first characterize the different dataflows employed by existing PIM accelerators, based on which a new dataflow is proposed to remarkably reduce the required A/D conversions for VMMs by extending shift and add (S+A) operations into the analog domain before the final quantizations. We then leverage a neural approximation method to design both analog accumulation circuits (S+A) and quantization circuits (ADCs) with RRAM crossbar arrays in a highly-efficient manner. Finally, we apply them to build an RRAM-based PIM accelerator (i.e., \textbf{Neural-PIM}) upon the proposed analog dataflow and evaluate its system-level performance. Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy, compared to the state-of-the-art RRAM-based PIM accelerators, i.e., ISAAC (CASCADE).