论文标题

Noiserank:依赖模型无监督的标签降低降噪

NoiseRank: Unsupervised Label Noise Reduction with Dependence Models

论文作者

Sharma, Karishma, Donmez, Pinar, Luo, Enming, Liu, Yan, Yalniz, I. Zeki

论文摘要

在从嘈杂的通道中获取的数据集中,标签噪声越来越普遍。检测和删除标签噪声的现有方法通常取决于某种形式的监督,这不可扩展且容易出错。在本文中,我们提出了Noiserank,以使用Markov随机场(MRF)进行无监督的标签降噪。我们构建了一个依赖模型,以估计给定数据集错误标记的实例的后验概率,并根据其估计概率进行排名实例。我们的方法1)不需要从地面上标签或标签或噪声分布的先知进行监督。 2)可以通过设计来解释,从而在标签噪声中脱离透明度。 3)对分类器体系结构/优化框架和内容模式是不可知论的。这些优势可以在实际噪声设置中进行广泛的适用性,这与受一个或多个条件约束的先前作品不同。 Noiserank改善了食品101-N(〜20%噪声)的最新分类,并且对高噪声服装1M有效(〜40%噪声)。

Label noise is increasingly prevalent in datasets acquired from noisy channels. Existing approaches that detect and remove label noise generally rely on some form of supervision, which is not scalable and error-prone. In this paper, we propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF). We construct a dependence model to estimate the posterior probability of an instance being incorrectly labeled given the dataset, and rank instances based on their estimated probabilities. Our method 1) Does not require supervision from ground-truth labels, or priors on label or noise distribution. 2) It is interpretable by design, enabling transparency in label noise removal. 3) It is agnostic to classifier architecture/optimization framework and content modality. These advantages enable wide applicability in real noise settings, unlike prior works constrained by one or more conditions. NoiseRank improves state-of-the-art classification on Food101-N (~20% noise), and is effective on high noise Clothing-1M (~40% noise).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源