对弱监督物体本地化和语义分割的类无形激活图的对比度学习

论文标题

对弱监督物体本地化和语义分割的类无形激活图的对比度学习

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

论文作者

Xie, Jinheng, Xiang, Jianfeng, Chen, Junliang, Hou, Xianxu, Zhao, Xiaodong, Shen, Linlin

论文摘要

尽管图像分类网络生成的类激活图（CAM）已被广泛用于弱监督的对象定位（WSOL）和语义分割（WSSS），但此类分类器通常集中在区分对象区域上。在本文中，我们提出了仅使用未标记的图像数据的类别不稳定激活图（C $^2 $ AM）生成的对比度学习，而无需参与图像级监督。核心思想来自这样的观察，即i）前景对象的语义信息通常不同于其背景； ii）具有相似外观或背景具有相似颜色/纹理的前景对象在特征空间中具有相似的表示形式。我们基于上述关系形成正面和负对，并迫使网络使用新型的对比损失，并使用类不足的激活图解散前景和背景。由于该网络被指导以区分跨图像前景背景，因此我们方法学到的类无义激活图会产生更完整的对象区域。我们从c $^2 $ am am am AM AM AM-AGNOSTIC对象边界框中取出，用于对象本地化和背景提示，以完善由分类网络生成的语义分割的CAM。 CUB-200-2011，Imagenet-1K和Pascal VOC2012数据集的广泛实验表明，WSOL和WSSS都可以从建议的C $^2 $ AM中受益。

While class activation map (CAM) generated by image classification network has been widely used for weakly supervised object localization (WSOL) and semantic segmentation (WSSS), such classifiers usually focus on discriminative object regions. In this paper, we propose Contrastive learning for Class-agnostic Activation Map (C$^2$AM) generation only using unlabeled image data, without the involvement of image-level supervision. The core idea comes from the observation that i) semantic information of foreground objects usually differs from their backgrounds; ii) foreground objects with similar appearance or background with similar color/texture have similar representations in the feature space. We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background with a class-agnostic activation map using a novel contrastive loss. As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions. We successfully extracted from C$^2$AM class-agnostic object bounding boxes for object localization and background cues to refine CAM generated by classification network for semantic segmentation. Extensive experiments on CUB-200-2011, ImageNet-1K, and PASCAL VOC2012 datasets show that both WSOL and WSSS can benefit from the proposed C$^2$AM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题