论文标题
全球和本地协作学习,用于共同降低对象检测
Global-and-Local Collaborative Learning for Co-Salient Object Detection
论文作者
论文摘要
共同降低对象检测(COSOD)的目的是发现通常出现在包含两个或更多相关图像的查询组中的显着对象。因此,如何有效提取图像对应关系对于COSOD任务至关重要。在本文中,我们提出了一个全局和本地协作学习体系结构,其中包括全球通信建模(GCM)和局部通信建模(LCM),以捕获来自全球和本地角度的不同图像之间的全面间图相对关系。首先,我们将不同的图像视为不同的时间切片,并使用3D卷积来直观地整合所有内部特征,从而可以更充分地提取全球组语义。其次,我们设计了一个成对相关转换(PCT),以探索成对图像之间的相似性对应关系,并结合多个局部成对对应关系以生成局部图像之间的关系。第三,GCM和LCM的间图像关系是通过全局和本地对应聚合(GLA)模块集成的,以探索更全面的间形间协作提示。最后,通过和间隔的加权融合(AEWF)模块可以自适应地整合功能,以学习共同提高功能并预测共同效果图。在三个盛行的COSOD基准数据集上评估了所提出的GLNET,这表明我们在小型数据集(约3K图像)上训练的模型仍然优于在一些大型数据集(约8K-200K图像)上训练的11名最先进的竞争对手。
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. Therefore, how to effectively extract inter-image correspondence is crucial for the CoSOD task. In this paper, we propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) to capture comprehensive inter-image corresponding relationship among different images from the global and local perspectives. Firstly, we treat different images as different time slices and use 3D convolution to integrate all intra features intuitively, which can more fully extract the global group semantics. Secondly, we design a pairwise correlation transformation (PCT) to explore similarity correspondence between pairwise images and combine the multiple local pairwise correspondences to generate the local inter-image relationship. Thirdly, the inter-image relationships of the GCM and LCM are integrated through a global-and-local correspondence aggregation (GLA) module to explore more comprehensive inter-image collaboration cues. Finally, the intra- and inter-features are adaptively integrated by an intra-and-inter weighting fusion (AEWF) module to learn co-saliency features and predict the co-saliency map. The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images).