域纹状体语音识别的域对抗神经网络

论文标题

域纹状体语音识别的域对抗神经网络

Domain Adversarial Neural Networks for Dysarthric Speech Recognition

论文作者

Woszczyk, Dominika, Petridis, Stavros, Millard, David

论文摘要

在过去的几年中，语音识别系统已大大改善，但是，对于强调或受损的语音案件，它们的性能显着降低。这项工作探讨了域违反语音的UAS数据集，探索了域的对抗性神经网络（DANN）。使用端到端CNN执行10个口语数字的分类任务，以原始音频为输入。将结果与说话者自适应（SA）模型以及依赖说话者（SD）和多任务学习模型（MTL）进行比较。本文进行的实验表明，Dann的绝对识别率为74.91％，表现优于基线12.18％。此外，DANN模型还获得了与SA模型识别率为77.65％的可比结果。我们还观察到，当可以使用标记的质心语音数据时，DANN和MTL的性能类似，但是当它们不是DANN时，DANN的性能就比MTL更好。

Speech recognition systems have improved dramatically over the last few years, however, their performance is significantly degraded for the cases of accented or impaired speech. This work explores domain adversarial neural networks (DANN) for speaker-independent speech recognition on the UAS dataset of dysarthric speech. The classification task on 10 spoken digits is performed using an end-to-end CNN taking raw audio as input. The results are compared to a speaker-adaptive (SA) model as well as speaker-dependent (SD) and multi-task learning models (MTL). The experiments conducted in this paper show that DANN achieves an absolute recognition rate of 74.91% and outperforms the baseline by 12.18%. Additionally, the DANN model achieves comparable results to the SA model's recognition rate of 77.65%. We also observe that when labelled dysarthric speech data is available DANN and MTL perform similarly, but when they are not DANN performs better than MTL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题