自动语音识别波斯学龄前儿童的语音评估

论文标题

自动语音识别波斯学龄前儿童的语音评估

Automatic Speech Recognition for Speech Assessment of Persian Preschool Children

论文作者

Abaskohi, Amirhossein, Mortazavi, Fatemeh, Moradi, Hadi

论文摘要

学龄前评估至关重要，因为它使教师和父母对孩子的成长和成长有影响力的知识。 COVID-19大流行强调了对学龄前儿童进行在线评估的必要性。应该测试的领域之一是他们的讲话能力。使用自动语音识别（ASR）系统，由于它们在频率和振幅方面与儿童的声音进行了预先训练，因此无济于事。由于其中大多数是在特定幅度范围内的数据预先训练的，因此它们的目标并不能使它们准备好以不同的幅度声音。为了克服此问题，我们为WAV2VEC 2.0模型的掩盖目标添加了一个新的目标，称为随机频率音高（RFP）。此外，我们使用了新引入的数据集对无意义的单词（MW）和快速自动命名（RAN）测试微调模型。在与RFP串联时使用掩码优于WAV2VEC 2.0的掩模目标，通过达到1.35的单词错误率（WER）。我们的新方法在CommonVoice数据集的波斯部分上达到6.45。此外，我们的新方法在零和少数场景中产生积极的结果。

Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题