论文标题
霓虹灯:基于电阻RAM的神经网络加速器中的非线性操作有效支持
NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators
论文作者
论文摘要
电阻随机访问记忆(RRAM)非常适合加速神经网络(NN)工作负载,作为基于RRAM的内存处理(PIM)体系结构本质上支持高度平行的多重蓄能(MAC)操作,构成了大多数NN工作负载的骨干。不幸的是,诸如变形金刚之类的NN工作负载需要支持RRAM无法本地提供的非MAC操作(例如SoftMax)。因此,最先进的工作要么集成了额外的数字逻辑电路,以支持非MAC操作,要么将非MAC操作卸载到CPU/GPU,从而导致由于数据移动而导致的性能和能源效率的大量高架。 在这项工作中,我们提出了霓虹灯,这是一种新颖的编译器优化,以实现RRAM中NN工作量的端到端执行。霓虹灯的关键思想是将每个非MAC操作转变为轻巧但高度准确的神经网络。利用神经网络近似非MAC操作提供了两个优点:1)我们可以利用RRAM的关键强度,即高度并行MAC的操作,以灵活有效地在内存中执行非MAC操作。 2)我们可以通过消除其他数字逻辑电路来简化RRAM的微体系结构,同时减少数据移动开销。与理想化的基于数字逻辑的RRAM相比,内存中非MAC操作的加速使霓虹灯达到了2.28倍的速度。我们分析了与转换相关的权衡,并证明了跨不同底物的霓虹灯的可行用例。
Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads as RRAM-based Processing-in-Memory (PIM) architectures natively support highly-parallel multiply-accumulate (MAC) operations that form the backbone of most NN workloads. Unfortunately, NN workloads such as transformers require support for non-MAC operations (e.g., softmax) that RRAM cannot provide natively. Consequently, state-of-the-art works either integrate additional digital logic circuits to support the non-MAC operations or offload the non-MAC operations to CPU/GPU, resulting in significant performance and energy efficiency overheads due to data movement. In this work, we propose NEON, a novel compiler optimization to enable the end-to-end execution of the NN workload in RRAM. The key idea of NEON is to transform each non-MAC operation into a lightweight yet highly-accurate neural network. Utilizing neural networks to approximate the non-MAC operations provides two advantages: 1) We can exploit the key strength of RRAM, i.e., highly-parallel MAC operation, to flexibly and efficiently execute non-MAC operations in memory. 2) We can simplify RRAM's microarchitecture by eliminating the additional digital logic circuits while reducing the data movement overheads. Acceleration of the non-MAC operations in memory enables NEON to achieve a 2.28x speedup compared to an idealized digital logic-based RRAM. We analyze the trade-offs associated with the transformation and demonstrate feasible use cases for NEON across different substrates.