基于栅极机制的多重化适应

论文标题

基于栅极机制的多重化适应

Multi-Accent Adaptation based on Gate Mechanism

论文作者

Zhu, Han, Wang, Li, Zhang, Pengyuan, Yan, Yonghong

论文摘要

当仅提供有限的重音语音数据（以促进多重语音识别性能）时，常规方法是重音特定的适应性，它可以独立地适应基线模型对多个目标重音。为了简化适应过程，我们探索将基线模型与多重混合数据同时同时使用多个目标重音。因此，我们建议使用带有栅极机制（AST-G）的重音特异性顶层实现多元重点适应性。与基线模型和口音特异性适应相比，AST-G分别达到9.8％和1.9％的平均相对降低。但是，在实际应用程序中，我们无法获得提前推断的Accent类别标签。因此，我们使用重音分类器应用来预测重音标签。为了共同训练声学模型和口音分类器，我们提出了使用栅极机制（MTL-G）的多任务学习。由于重音标签的预测可能是不准确的，因此其性能比重音特定的适应性差。然而，与基线模型相比，MTL-G可实现5.1％的平均相对降低。

When only a limited amount of accented speech data is available, to promote multi-accent speech recognition performance, the conventional approach is accent-specific adaptation, which adapts the baseline model to multiple target accents independently. To simplify the adaptation procedure, we explore adapting the baseline model to multiple target accents simultaneously with multi-accent mixed data. Thus, we propose using accent-specific top layer with gate mechanism (AST-G) to realize multi-accent adaptation. Compared with the baseline model and accent-specific adaptation, AST-G achieves 9.8% and 1.9% average relative WER reduction respectively. However, in real-world applications, we can't obtain the accent category label for inference in advance. Therefore, we apply using an accent classifier to predict the accent label. To jointly train the acoustic model and the accent classifier, we propose the multi-task learning with gate mechanism (MTL-G). As the accent label prediction could be inaccurate, it performs worse than the accent-specific adaptation. Yet, in comparison with the baseline model, MTL-G achieves 5.1% average relative WER reduction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题