论文标题
ASRPU:低功率自动语音识别的可编程加速器
ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition
论文作者
论文摘要
现代自动语音识别(ASR)系统所达到的出色准确性使它们能够迅速成为主流技术。 ASR对于许多应用程序至关重要,例如基于语音的助手,听写系统和实时语言翻译。但是,高度准确的ASR系统在计算上很昂贵,需要根据数十亿算术操作的顺序解码音频的每一秒,这与对在边缘设备上部署ASR的兴趣日益增长的冲突。在这些设备上,硬件加速度是实现可接受性能的关键。但是,ASR是一个丰富而快速变化的领域,因此,任何过度专业的硬件加速器都可能很快变得过时。 在本文中,我们通过提议ASRPU(on-Edge ASR的可编程加速器)来应对这些挑战。 ASRPU包含一个通用核心池,该核心执行了小块并行代码。这些程序中的每一个都计算整个解码器的一部分(例如,神经网络中的一层)。加速器自动化了解码器的一些精心选择的部分,以简化编程而无需牺牲一般性。我们对ASRPU上实施的现代ASR系统进行了分析,并表明该体系结构可以通过非常低的功率预算实现实时解码。
The outstanding accuracy achieved by modern Automatic Speech Recognition (ASR) systems is enabling them to quickly become a mainstream technology. ASR is essential for many applications, such as speech-based assistants, dictation systems and real-time language translation. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operations to decode each second of audio, which conflicts with a growing interest in deploying ASR on edge devices. On these devices, hardware acceleration is key for achieving acceptable performance. However, ASR is a rich and fast-changing field, and thus, any overly specialized hardware accelerator may quickly become obsolete. In this paper, we tackle those challenges by proposing ASRPU, a programmable accelerator for on-edge ASR. ASRPU contains a pool of general-purpose cores that execute small pieces of parallel code. Each of these programs computes one part of the overall decoder (e.g. a layer in a neural network). The accelerator automates some carefully chosen parts of the decoder to simplify the programming without sacrificing generality. We provide an analysis of a modern ASR system implemented on ASRPU and show that this architecture can achieve real-time decoding with a very low power budget.