使用很少或没有配对的数据学习两个序列的联合分布

论文标题

使用很少或没有配对的数据学习两个序列的联合分布

Learning the joint distribution of two sequences using little or no paired data

论文作者

Mariooryad, Soroosh, Shannon, Matt, Ma, Siyuan, Bagby, Tom, Kao, David, Stanton, Daisy, Battenberg, Eric, Skerry-Ryan, RJ

论文摘要

我们提出了两个序列的嘈杂的通道生成模型，例如文本和语音，该模型可以在有限的配对数据可用时发现两种方式之间的关联。为了解决现实数据设置下确切模型的棘手性，我们提出了一个变异的推断近似值。为了使用分类数据训练这个变异模型，我们提出了一种与唤醒级别算法的连接的KL编码器损耗方法。仅在数据分布的某些条件下才有可能通过仅观察到边缘的未配对样本来识别关节或条件分布，我们在可能实现的条件独立性假设的哪种类型的情况下进行讨论，这指导建筑设计。实验结果表明，即使有大量的未配对数据，即使少量的配对数据（5分钟）也足以学习将两种模式（图形和音素）相关联，铺平了在低数据资源制度中为所有SEQ2SEQ模型采用这种原则方法的路径。

We present a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the association between the two modalities when limited paired data is available. To address the intractability of the exact model under a realistic data setup, we propose a variational inference approximation. To train this variational model with categorical data, we propose a KL encoder loss approach which has connections to the wake-sleep algorithm. Identifying the joint or conditional distributions by only observing unpaired samples from the marginals is only possible under certain conditions in the data distribution and we discuss under what type of conditional independence assumptions that might be achieved, which guides the architecture designs. Experimental results show that even tiny amount of paired data (5 minutes) is sufficient to learn to relate the two modalities (graphemes and phonemes here) when a massive amount of unpaired data is available, paving the path to adopting this principled approach for all seq2seq models in low data resource regimes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题