语音表示学习的离散潜在变量模型的比较

论文标题

语音表示学习的离散潜在变量模型的比较

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

论文作者

Zhou, Henry, Baevski, Alexei, Auli, Michael

论文摘要

神经潜在变量模型可以在语音音频数据中发现有趣的结构。本文介绍了两种不同方法的比较，这些方法广泛地基于预测未来的时间步长或自动编码输入信号。我们的研究比较了VQ-VAE和VQ-WAV2VEC所学的表示，从子词单位发现和音素识别性能方面。结果表明，VQ-WAV2VEC的未来时间步长预测可以取得更好的性能。最佳系统在Zerospeech 2019 ABX音素歧视挑战上达到13.22的错误率

Neural latent variable models enable the discovery of interesting structure in speech audio data. This paper presents a comparison of two different approaches which are broadly based on predicting future time-steps or auto-encoding the input signal. Our study compares the representations learned by vq-vae and vq-wav2vec in terms of sub-word unit discovery and phoneme recognition performance. Results show that future time-step prediction with vq-wav2vec achieves better performance. The best system achieves an error rate of 13.22 on the ZeroSpeech 2019 ABX phoneme discrimination challenge

下载PDF全文

下载文献需遵守相关版权规定

论文标题