语义分割的位置感知的自我监督变压器

论文标题

语义分割的位置感知的自我监督变压器

Location-Aware Self-Supervised Transformers for Semantic Segmentation

论文作者

Caron, Mathilde, Houlsby, Neil, Schmid, Cordelia

论文摘要

像素级标签的获取特别昂贵。因此，预处理是改善语义分割等任务模型的关键步骤。但是，预处理神经网络的突出算法使用图像级目标，例如图像分类，图像文本对齐剪辑或自我监视的对比度学习。这些目标没有建模空间信息，在使用空间推理的下游任务上进行填充时，这可能是次优的。在这项工作中，我们使用一种位置感知（LOCA）自我监督的方法为网络提供了预告，该方法促进了强大的特征的出现。具体来说，我们使用贴片级聚类方案来开采密集的伪标签和相对位置预测任务，以鼓励学习对象零件及其空间布置。我们的实验表明，LOCA进行预处理会导致表现能够竞争到具有挑战性和不同的语义分割数据集。

Pixel-level labels are particularly expensive to acquire. Hence, pretraining is a critical step to improve models on a task like semantic segmentation. However, prominent algorithms for pretraining neural networks use image-level objectives, e.g. image classification, image-text alignment a la CLIP, or self-supervised contrastive learning. These objectives do not model spatial information, which might be sub-optimal when finetuning on downstream tasks with spatial reasoning. In this work, we pretrain network with a location-aware (LOCA) self-supervised method which fosters the emergence of strong dense features. Specifically, we use both a patch-level clustering scheme to mine dense pseudo-labels and a relative location prediction task to encourage learning about object parts and their spatial arrangements. Our experiments show that LOCA pretraining leads to representations that transfer competitively to challenging and diverse semantic segmentation datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题