学习离散语音表示的自回归共同培训

论文标题

学习离散语音表示的自回归共同培训

Autoregressive Co-Training for Learning Discrete Speech Representations

论文作者

Yeh, Sung-Lin, Tang, Hao

论文摘要

尽管已经提出了几种学习离散语音表示的自我监督的方法，但尚不清楚这些看似相似的方法如何相互关系。在本文中，我们考虑了一个具有离散潜在变量的生成模型，该模型可以学习语音的离散表示形式。学习生成模型的目的是作为信息理论共同培训而提出的。除了广泛的一般性外，可以通过几种方法来优化目标，从而包含Hubert样培训和学习离散表示的矢量量化。从经验上讲，我们发现所提出的方法学习与语音单元高度相关的离散表示，比Hubert样训练和矢量量化更相关。

While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other. In this paper, we consider a generative model with discrete latent variables that learns a discrete representation for speech. The objective of learning the generative model is formulated as information-theoretic co-training. Besides the wide generality, the objective can be optimized with several approaches, subsuming HuBERT-like training and vector quantization for learning discrete representation. Empirically, we find that the proposed approach learns discrete representation that is highly correlated with phonetic units, more correlated than HuBERT-like training and vector quantization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题