图像文本检索：有关最新研发的调查

论文标题

图像文本检索：有关最新研发的调查

Image-text Retrieval: A Survey on Recent Research and Development

论文作者

Cao, Min, Li, Shiping, Li, Juntao, Nie, Liqiang, Zhang, Min

论文摘要

在过去的几年中，由于其出色的研究价值和广泛的现实应用，跨模式图像文本检索（ITR）对研究界的兴趣增加了。它是针对从一种模式出发的查询和从另一种模式的检索画廊进行设计的。本文从四个角度提出了对ITR方法的全面和最新的调查。通过将ITR系统分解为两个过程：特征提取和特征对齐，我们总结了从这两个角度来看ITR方法的最新进步。最重要的是，引入了以效率为中心的ITR系统的研究。为了跟上时代的步伐，我们还提供了跨模式预训练ITR接近的开创性概述。最后，我们概述了ITR的通用基准数据集和估值指标，并在代表性ITR方法之间进行了准确性比较。本文结尾讨论了一些关键但研究较少的问题。

In the past few years, cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application. It is designed for the scenarios where the queries are from one modality and the retrieval galleries from another modality. This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives. By dissecting an ITR system into two processes: feature extraction and feature alignment, we summarize the recent advance of the ITR approaches from these two perspectives. On top of this, the efficiency-focused study on the ITR system is introduced as the third perspective. To keep pace with the times, we also provide a pioneering overview of the cross-modal pre-training ITR approaches as the fourth perspective. Finally, we outline the common benchmark datasets and valuation metric for ITR, and conduct the accuracy comparison among the representative ITR approaches. Some critical yet less studied issues are discussed at the end of the paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题