CMSBERT-CLR：上下文驱动的方式转移Bert，以对比度学习的语言，视觉，声学表示

论文标题

CMSBERT-CLR：上下文驱动的方式转移Bert，以对比度学习的语言，视觉，声学表示

CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations

论文作者

Kim, Junghun, Kim, Jihie

论文摘要

随着对多模式在线内容的需求正在增长，多模式情感分析已成为一个越来越流行的研究领域。对于多模式的情感分析，根据语言上下文和非语言信息，单词可能具有不同的含义，因此相应地理解单词的含义至关重要。此外，应在包含非语言信息的整个话语上下文中解释单词含义。在本文中，我们提出了一种以上下文驱动的方式转移BERT，其中包括对对比性，视觉，声学表示的对比学习（CMSBERT-CLR），该学习通过对比学习将整个上下文的非语言和言语信息结合在一起，并更有效地对齐方式。首先，我们介绍了一个以上下文驱动的方式转移（CMS），以将非语言和言语信息纳入整个句子话语的上下文中。然后，为了改善共同嵌入空间内不同方式的一致性，我们采用对比度学习。此外，我们使用指数的移动平均参数和标签平滑作为优化策略，这可以使网络的收敛性更加稳定并提高对齐的灵活性。在我们的实验中，我们证明了我们的方法可以实现最新的结果。

Multimodal sentiment analysis has become an increasingly popular research area as the demand for multimodal online content is growing. For multimodal sentiment analysis, words can have different meanings depending on the linguistic context and non-verbal information, so it is crucial to understand the meaning of the words accordingly. In addition, the word meanings should be interpreted within the whole utterance context that includes nonverbal information. In this paper, we present a Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations (CMSBERT-CLR), which incorporates the whole context's non-verbal and verbal information and aligns modalities more effectively through contrastive learning. First, we introduce a Context-driven Modality Shifting (CMS) to incorporate the non-verbal and verbal information within the whole context of the sentence utterance. Then, for improving the alignment of different modalities within a common embedding space, we apply contrastive learning. Furthermore, we use an exponential moving average parameter and label smoothing as optimization strategies, which can make the convergence of the network more stable and increase the flexibility of the alignment. In our experiments, we demonstrate that our approach achieves state-of-the-art results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题