论文标题
部分可观测时空混沌系统的无模型预测
An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing
论文作者
论文摘要
精确且可扩展的跨模式图像文本检索方法的开发,其中一种模式(例如,文本)的查询可以匹配到另一个模式(例如,遥感图像)的存档条目(例如,遥感图像)在遥感中引起了极大的关注(RS)。大多数现有方法都假定存在具有准确匹配的文本图像对的可靠多模式训练集。但是,由于多模式训练集可能包括嘈杂的对(即,与训练图像相关的文本描述/字幕可能是嘈杂的),因此这种假设可能并不总是存在,因为可以扭曲检索方法的学习过程。为了解决这个问题,我们提出了一种新型的无监督的交叉模式散列方法,可鲁棒地对嘈杂的图像文本对应(CHNR)。 CHNR由三个模块组成:1)特征提取模块,这些模块提取图像 - 文本对的特征表示; 2)噪声检测模块,该模块检测潜在的嘈杂对应关系; 3)生成跨模式二进制哈希码的哈希模块。提出的CHNR包括两个训练阶段:i)使用一小部分清洁(即可靠)数据以对抗性方式训练噪声检测模块的元学习阶段; ii)使用训练的噪声检测模块用于识别嘈杂的对应关系的主要训练阶段,同时在嘈杂的多模式训练集上对哈希模块进行了训练。实验结果表明,所提出的CHNR优于最先进的方法。我们的代码可在https://git.tu-berlin.de/rsim/chnr上公开获取
The development of accurate and scalable cross-modal image-text retrieval methods, where queries from one modality (e.g., text) can be matched to archive entries from another (e.g., remote sensing image) has attracted great attention in remote sensing (RS). Most of the existing methods assume that a reliable multi-modal training set with accurately matched text-image pairs is existing. However, this assumption may not always hold since the multi-modal training sets may include noisy pairs (i.e., textual descriptions/captions associated to training images can be noisy), distorting the learning process of the retrieval methods. To address this problem, we propose a novel unsupervised cross-modal hashing method robust to the noisy image-text correspondences (CHNR). CHNR consists of three modules: 1) feature extraction module, which extracts feature representations of image-text pairs; 2) noise detection module, which detects potential noisy correspondences; and 3) hashing module that generates cross-modal binary hash codes. The proposed CHNR includes two training phases: i) meta-learning phase that uses a small portion of clean (i.e., reliable) data to train the noise detection module in an adversarial fashion; and ii) the main training phase for which the trained noise detection module is used to identify noisy correspondences while the hashing module is trained on the noisy multi-modal training set. Experimental results show that the proposed CHNR outperforms state-of-the-art methods. Our code is publicly available at https://git.tu-berlin.de/rsim/chnr