基于不确定性的跨模式检索和概率表示

论文标题

基于不确定性的跨模式检索和概率表示

Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations

论文作者

Pishdad, Leila, Zhang, Ran, Derpanis, Konstantinos G., Jepson, Allan, Fazly, Afsaneh

论文摘要

事实证明，概率嵌入对于捕获多义单词含义以及图像匹配中的歧义很有用。在本文中，我们研究了在跨模式设置（即文本和图像）中概率嵌入的优势，并提出了一种简单的方法，该方法将现存的图像文本匹配模型中的标准矢量点嵌入替换为概率分布中的标准矢量点嵌入，并替代了参数学的概率分布。我们的指导假设是，概率嵌入中编码的不确定性捕获了输入实例中的跨模式模棱两可，并且正是通过捕获这种不确定性，概率模型可以在下游任务中更好地执行，例如图像到文本或文本形象 - 直到图像到现象。通过对标准和新基准测试的广泛实验，我们在跨模式检索中显示出概率表示的一致优势，并验证了嵌入式捕获不确定性的能力。

Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are parametrically learned. Our guiding hypothesis is that the uncertainty encoded in the probabilistic embeddings captures the cross-modal ambiguity in the input instances, and that it is through capturing this uncertainty that the probabilistic models can perform better at downstream tasks, such as image-to-text or text-to-image retrieval. Through extensive experiments on standard and new benchmarks, we show a consistent advantage for probabilistic representations in cross-modal retrieval, and validate the ability of our embeddings to capture uncertainty.

下载PDF全文

下载文献需遵守相关版权规定

论文标题