图像文本检索的模式内约束损失

论文标题

图像文本检索的模式内约束损失

Intra-Modal Constraint Loss For Image-Text Retrieval

论文作者

Chen, Jianan, Zhang, Lu, Wang, Qiong, Bai, Cong, Kpalma, Kidiyo

论文摘要

跨模式检索引起了计算机视觉和自然语言处理领域的广泛关注。随着卷积和经常性神经网络的发展，跨图像文本模态的检索瓶颈不再是图像和文本特征的提取，而是嵌入空间中有效的损失函数学习。许多损失功能都试图从异质方式中更关闭成对特征。本文提出了一种使用模式内约束损失函数学习图像和文本的联合嵌入的方法，以减少从相同均匀方式中违反负面对的负面对。实验结果表明，我们的方法优于FlickR30K和Microsoft Coco数据集的最先进的双向图像检索方法。我们的代码公开可用：https：//github.com/canonchen/imc。

Cross-modal retrieval has drawn much attention in both computer vision and natural language processing domains. With the development of convolutional and recurrent neural networks, the bottleneck of retrieval across image-text modalities is no longer the extraction of image and text features but an efficient loss function learning in embedding space. Many loss functions try to closer pairwise features from heterogeneous modalities. This paper proposes a method for learning joint embedding of images and texts using an intra-modal constraint loss function to reduce the violation of negative pairs from the same homogeneous modality. Experimental results show that our approach outperforms state-of-the-art bi-directional image-text retrieval methods on Flickr30K and Microsoft COCO datasets. Our code is publicly available: https://github.com/CanonChen/IMC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题