论文标题
MMLATCH:自下而上的自上而下融合用于多模式情感分析
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis
论文作者
论文摘要
当前的多模式融合的深度学习方法依赖于高层和中级潜在模态表示(晚期/中期融合)或低级感觉输入(早期融合)的自下而上的融合。人类感知的模型突出了自上而下的融合的重要性,其中高级表示会影响感官输入的方式,即认知会影响感知。这些自上而下的互动未在当前的深度学习模型中捕获。在这项工作中,我们提出了一种神经体系结构,该神经体系结构在网络培训期间使用向前传球中的反馈机制捕获自上而下的跨模式相互作用。提出的机制为每种模式提取高级表示,并使用这些表示形式掩盖感觉输入,从而使模型可以执行自上而下的特征掩蔽。我们将提出的模型应用于CMU-MOSEI上的多模式情感识别。我们的方法显示出对良好的多和强大的晚期融合基线的一致改进,从而取得了最新的结果。
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations (late/mid fusion) or low level sensory inputs (early fusion). Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived, i.e. cognition affects perception. These top-down interactions are not captured in current deep learning models. In this work we propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training. The proposed mechanism extracts high-level representations for each modality and uses these representations to mask the sensory inputs, allowing the model to perform top-down feature masking. We apply the proposed model for multimodal sentiment recognition on CMU-MOSEI. Our method shows consistent improvements over the well established MulT and over our strong late fusion baseline, achieving state-of-the-art results.