多模式变压器，用于并联串联变异自动编码器

论文标题

多模式变压器，用于并联串联变异自动编码器

Multimodal Transformer for Parallel Concatenated Variational Autoencoders

论文作者

Liang, Stephen D., Mendel, Jerry M.

论文摘要

在本文中，我们使用并行串联体系结构提出了一个多模式变压器。我们不使用补丁，而是将列条纹用于r，g，b通道中的图像作为变压器输入。列条纹保持原始图像的空间关系。我们将多模式变压器与用于合成跨模式数据生成的变量自动编码器结合在一起。多模式变压器是使用多个压缩矩阵设计的，它用作并行串联变异自动编码器（PC-VAE）的编码器。 PC-VAE由多个编码器，一个潜在空间和两个解码器组成。编码器基于随机的高斯矩阵，不需要任何培训。我们根据部分信息分解的交互信息提出了一个新的损失函数。交互信息评估输入跨模式信息和解码器输出。 PC-VAE是通过最小化损耗函数来训练的。进行实验以验证所提出的PC-VAE的多模式变压器。

In this paper, we propose a multimodal transformer using parallel concatenated architecture. Instead of using patches, we use column stripes for images in R, G, B channels as the transformer input. The column stripes keep the spatial relations of original image. We incorporate the multimodal transformer with variational autoencoder for synthetic cross-modal data generation. The multimodal transformer is designed using multiple compression matrices, and it serves as encoders for Parallel Concatenated Variational AutoEncoders (PC-VAE). The PC-VAE consists of multiple encoders, one latent space, and two decoders. The encoders are based on random Gaussian matrices and don't need any training. We propose a new loss function based on the interaction information from partial information decomposition. The interaction information evaluates the input cross-modal information and decoder output. The PC-VAE are trained via minimizing the loss function. Experiments are performed to validate the proposed multimodal transformer for PC-VAE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题