施放模型：学习本地化可以改善自我监督的表示形式

论文标题

施放模型：学习本地化可以改善自我监督的表示形式

CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

论文作者

Selvaraju, Ramprasaath R., Desai, Karan, Johnson, Justin, Naik, Nikhil

论文摘要

自我监管学习（SSL）的最新进展在很大程度上通过监督的成像网审计，在很大程度上缩小了差距。尽管取得了成功，但这些方法主要用于未标记的成像网图像，并在对较大的未经切割图像集进行训练时显示出边缘增长。我们假设当前的SSL方法在标志性图像上表现最佳，并且在复杂的场景图像上挣扎了许多对象。分析对比性SSL方法表明，在现场图像训练时，它们的视觉接地差，并且会收到较差的监督信号。我们提出了对比性注意力预见的调整（CAST），以克服这些局限性。演员使用无监督的显着性图来智能采样农作物，并通过毕业-CAM注意力丧失提供接地监督。可可的实验表明，铸造可以显着改善SSL方法在场景图像上学习的特征，进一步的实验表明，铸造训练的模型对背景的变化更为强大。

Recent advances in self-supervised learning (SSL) have largely closed the gap with supervised ImageNet pretraining. Despite their success these methods have been primarily applied to unlabeled ImageNet images, and show marginal gains when trained on larger sets of uncurated images. We hypothesize that current SSL methods perform best on iconic images, and struggle on complex scene images with many objects. Analyzing contrastive SSL methods shows that they have poor visual grounding and receive poor supervisory signal when trained on scene images. We propose Contrastive Attention-Supervised Tuning(CAST) to overcome these limitations. CAST uses unsupervised saliency maps to intelligently sample crops, and to provide grounding supervision via a Grad-CAM attention loss. Experiments on COCO show that CAST significantly improves the features learned by SSL methods on scene images, and further experiments show that CAST-trained models are more robust to changes in backgrounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题