论文标题
超越监督与无监督:图像表示学习的代表性基准和分析
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning
论文作者
论文摘要
通过利用对比度学习,聚类和其他借口任务,学习图像表示的无监督方法在标准基准上取得了令人印象深刻的结果。结果是一个拥挤的领域 - 许多具有实质性实现的方法产生的结果似乎在流行的基准上似乎几乎相同,例如对Imagenet的线性评估。但是,一个结果并不能说明整个故事。在本文中,我们使用基于性能的基准测试(例如线性评估,最接近的邻居分类和聚类)对方法进行比较,这表明在当前最新ART中缺乏清晰的前跑者。与仅执行监督与无监督比较的先前工作相反,我们将几种不同的无监督方法相互比较。为了丰富这种比较,我们通过测量值,例如均匀性,耐受性和中心内核比对(CKA)分析嵌入,并提出了我们自己的两个新指标:最近的邻居图相似性和线性预测重叠。我们通过分析揭示,单独的单一流行方法不应被对待,就像它们代表整个领域一样,未来的工作应该考虑如何利用这些方法的免费性质。我们还利用CKA提供一个框架来稳健地量化增强不变性,并提醒您某些类型的不变性对于下游任务是不希望的。
By leveraging contrastive learning, clustering, and other pretext tasks, unsupervised methods for learning image representations have reached impressive results on standard benchmarks. The result has been a crowded field - many methods with substantially different implementations yield results that seem nearly identical on popular benchmarks, such as linear evaluation on ImageNet. However, a single result does not tell the whole story. In this paper, we compare methods using performance-based benchmarks such as linear evaluation, nearest neighbor classification, and clustering for several different datasets, demonstrating the lack of a clear front-runner within the current state-of-the-art. In contrast to prior work that performs only supervised vs. unsupervised comparison, we compare several different unsupervised methods against each other. To enrich this comparison, we analyze embeddings with measurements such as uniformity, tolerance, and centered kernel alignment (CKA), and propose two new metrics of our own: nearest neighbor graph similarity and linear prediction overlap. We reveal through our analysis that in isolation, single popular methods should not be treated as though they represent the field as a whole, and that future work ought to consider how to leverage the complimentary nature of these methods. We also leverage CKA to provide a framework to robustly quantify augmentation invariance, and provide a reminder that certain types of invariance will be undesirable for downstream tasks.