论文标题
使用归一化切割的自制变压器进行无监督的对象发现
Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
论文作者
论文摘要
已经证明,经过自我监督的学习训练的变压器已证明使用自依据损失(Dino)产生注意力图,以突出明显的前景对象。在本文中,我们演示了一种基于图的方法,该方法使用自我监督的变压器功能从图像中发现对象。视觉令牌在加权图中被视为节点,边缘代表基于令牌的相似性的连接得分。然后,可以使用标准化的图表对组进行分割前景对象,以分组自相似区域。我们使用具有广义本特征分类的光谱聚类来解决图形问题,并表明第二小的特征向量提供了切割解决方案,因为其绝对值表示代币属于前景对象的可能性。尽管它很简单,但这种方法显着提高了无监督的对象发现的性能:我们在损失的最新水平上的损失率分别为6.9%,8.1%和8.1%,在VOC07,VOC12和Coco20k上。可以通过添加第二阶段 - 不合Snostic探测器(CAD)来进一步提高性能。我们提出的方法很容易扩展到无监督的显着性检测和弱监督的对象检测。对于无监督的显着性检测,我们将ECSD,DUTS,DUT-OMRON分别与以前的艺术状态相比,ECSD,DUTS,DUT-OMROR分别为4.9%,5.2%,12.9%的IOU。对于弱监督的对象检测,我们在CUB和Imagenet上实现了竞争性能。
Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet.