论文标题
语音表示学习的离散潜在变量模型的比较
A Comparison of Discrete Latent Variable Models for Speech Representation Learning
论文作者
论文摘要
神经潜在变量模型可以在语音音频数据中发现有趣的结构。本文介绍了两种不同方法的比较,这些方法广泛地基于预测未来的时间步长或自动编码输入信号。我们的研究比较了VQ-VAE和VQ-WAV2VEC所学的表示,从子词单位发现和音素识别性能方面。结果表明,VQ-WAV2VEC的未来时间步长预测可以取得更好的性能。最佳系统在Zerospeech 2019 ABX音素歧视挑战上达到13.22的错误率
Neural latent variable models enable the discovery of interesting structure in speech audio data. This paper presents a comparison of two different approaches which are broadly based on predicting future time-steps or auto-encoding the input signal. Our study compares the representations learned by vq-vae and vq-wav2vec in terms of sub-word unit discovery and phoneme recognition performance. Results show that future time-step prediction with vq-wav2vec achieves better performance. The best system achieves an error rate of 13.22 on the ZeroSpeech 2019 ABX phoneme discrimination challenge