多模式表示与文本和图像的学习

论文标题

多模式表示与文本和图像的学习

Multimodal Representation Learning With Text and Images

论文作者

Jayagopal, Aishwarya, Aiswarya, Ankireddy Monica, Garg, Ankita, Nandakumar, Srinivasan Kolumam

论文摘要

近年来，随着研究人员正在整合不同类型的数据，例如文本，图像，语音，以取得最佳结果，多模式AI已经看到了上升趋势。该项目利用多模式AI和矩阵分解技术，同时在文本和图像数据上进行表示，从而采用广泛使用的自然语言处理技术（NLP）和计算机视觉。使用下游分类和回归任务评估学习的表示形式。所采用的方法可以扩展到该项目的范围之外，因为它使用自动编码器进行无监督的表示学习。

In recent years, multimodal AI has seen an upward trend as researchers are integrating data of different types such as text, images, speech into modelling to get the best results. This project leverages multimodal AI and matrix factorization techniques for representation learning, on text and image data simultaneously, thereby employing the widely used techniques of Natural Language Processing (NLP) and Computer Vision. The learnt representations are evaluated using downstream classification and regression tasks. The methodology adopted can be extended beyond the scope of this project as it uses Auto-Encoders for unsupervised representation learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题