sendencemim：潜在变量语言模型

论文标题

sendencemim：潜在变量语言模型

SentenceMIM: A Latent Variable Language Model

论文作者

Livne, Micha, Swersky, Kevin, Fleet, David J.

论文摘要

SentenCemim是语言数据的概率自动编码器，接受了互助机器（MIM）的培训，以提供可变长度语言观察值的固定长度表示（即类似于VAE）。先前试图学习语言数据的VAE，这是由于后置崩溃而面临的挑战。 MIM学习鼓励观测和潜在变量之间的高度共同信息，并且可以抵抗后塌陷。因此，它学习了信息的表示，其维度可能比现有语言VAE高的数量级。重要的是，句子损失没有超参数，可以简化优化。我们将sentencemim与VAE进行比较，并在多个数据集上进行AE进行比较。索引产生与AE相当的出色重建，具有丰富的结构性潜在空间，可与VAE相当。结构化的潜在表示通过不同长度的句子之间的插值证明。我们通过利用训练有素的模型来避开和转移学习，在不进行微调，优于具有类似体系结构的VAE和AE的情况下，通过训练有素的模型来证明句子的多功能性。

SentenceMIM is a probabilistic auto-encoder for language data, trained with Mutual Information Machine (MIM) learning to provide a fixed length representation of variable length language observations (i.e., similar to VAE). Previous attempts to learn VAEs for language data faced challenges due to posterior collapse. MIM learning encourages high mutual information between observations and latent variables, and is robust against posterior collapse. As such, it learns informative representations whose dimension can be an order of magnitude higher than existing language VAEs. Importantly, the SentenceMIM loss has no hyper-parameters, simplifying optimization. We compare sentenceMIM with VAE, and AE on multiple datasets. SentenceMIM yields excellent reconstruction, comparable to AEs, with a rich structured latent space, comparable to VAEs. The structured latent representation is demonstrated with interpolation between sentences of different lengths. We demonstrate the versatility of sentenceMIM by utilizing a trained model for question-answering and transfer learning, without fine-tuning, outperforming VAE and AE with similar architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题