ASRPU：低功率自动语音识别的可编程加速器

论文标题

ASRPU：低功率自动语音识别的可编程加速器

ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition

论文作者

Pinto, Dennis, Arnau, Jose-María, González, Antonio

论文摘要

现代自动语音识别（ASR）系统所达到的出色准确性使它们能够迅速成为主流技术。 ASR对于许多应用程序至关重要，例如基于语音的助手，听写系统和实时语言翻译。但是，高度准确的ASR系统在计算上很昂贵，需要根据数十亿算术操作的顺序解码音频的每一秒，这与对在边缘设备上部署ASR的兴趣日益增长的冲突。在这些设备上，硬件加速度是实现可接受性能的关键。但是，ASR是一个丰富而快速变化的领域，因此，任何过度专业的硬件加速器都可能很快变得过时。在本文中，我们通过提议ASRPU（on-Edge ASR的可编程加速器）来应对这些挑战。 ASRPU包含一个通用核心池，该核心执行了小块并行代码。这些程序中的每一个都计算整个解码器的一部分（例如，神经网络中的一层）。加速器自动化了解码器的一些精心选择的部分，以简化编程而无需牺牲一般性。我们对ASRPU上实施的现代ASR系统进行了分析，并表明该体系结构可以通过非常低的功率预算实现实时解码。

The outstanding accuracy achieved by modern Automatic Speech Recognition (ASR) systems is enabling them to quickly become a mainstream technology. ASR is essential for many applications, such as speech-based assistants, dictation systems and real-time language translation. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operations to decode each second of audio, which conflicts with a growing interest in deploying ASR on edge devices. On these devices, hardware acceleration is key for achieving acceptable performance. However, ASR is a rich and fast-changing field, and thus, any overly specialized hardware accelerator may quickly become obsolete. In this paper, we tackle those challenges by proposing ASRPU, a programmable accelerator for on-edge ASR. ASRPU contains a pool of general-purpose cores that execute small pieces of parallel code. Each of these programs computes one part of the overall decoder (e.g. a layer in a neural network). The accelerator automates some carefully chosen parts of the decoder to simplify the programming without sacrificing generality. We provide an analysis of a modern ASR system implemented on ASRPU and show that this architecture can achieve real-time decoding with a very low power budget.

下载PDF全文

下载文献需遵守相关版权规定

论文标题