LMEC：可学习的乘法绝对位置嵌入基于语音识别的构象体

论文标题

LMEC：可学习的乘法绝对位置嵌入基于语音识别的构象体

LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition

论文作者

Yang, Yuguang, Pan, Yu, Yin, Jingjing, Lu, Heng

论文摘要

本文提出了一个可学习的乘法绝对位置嵌入基于构象体（LMEC）。它包含一个称为LMLA的内核线性注意（LA）模块，以解决长序列语音识别以及FFN结构的替代方案的耗时问题。首先，ELU函数被用作我们提出的LA模块的内核函数。其次，我们提出了一种基于新型的可学习乘法绝对位置嵌入（LM-APE）的重新加权机制，该机制可以降低众所周知的SoftMax自我注意力的著名二次二次时间空间复杂性。第三，我们使用门控线性单元（GLU）代替饲料前向网络（FFN）以提高性能。在公共Librispeech数据集上进行了广泛的实验。与具有共构形式线性注意的构象模型相比，我们提出的方法可以在测试中实现高达0.63％的单词率率改进，并提高推理速度在LA模块上最多可提高13％（左产品）和33％（右产品）。

This paper proposes a Learnable Multiplicative absolute position Embedding based Conformer (LMEC). It contains a kernelized linear attention (LA) module called LMLA to solve the time-consuming problem for long sequence speech recognition as well as an alternative to the FFN structure. First, the ELU function is adopted as the kernel function of our proposed LA module. Second, we propose a novel Learnable Multiplicative Absolute Position Embedding (LM-APE) based re-weighting mechanism that can reduce the well-known quadratic temporal-space complexity of softmax self-attention. Third, we use Gated Linear Units (GLU) to substitute the Feed Forward Network (FFN) for better performance. Extensive experiments have been conducted on the public LibriSpeech datasets. Compared to the Conformer model with cosFormer style linear attention, our proposed method can achieve up to 0.63% word-error-rate improvement on test-other and improve the inference speed by up to 13% (left product) and 33% (right product) on the LA module.

下载PDF全文

下载文献需遵守相关版权规定

论文标题