论文标题
自信心地监督图像分类
Webly Supervised Image Classification with Self-Contained Confidence
论文作者
论文摘要
本文着重于Webly监督学习(WSL),在该学习中,数据集是通过从Internet上爬行样本并直接将搜索查询作为Web标签构建的。尽管WSL受益于快速和低成本的数据收集,但网络标签中的噪音阻碍了图像分类模型的更好性能。为了减轻这个问题,在最近的工作中,使用自我监督的损失$ \ Mathcal {l} _s $与Webly监督损失$ \ Mathcal {l} _W $一起使用。 $ \ Mathcal {l} _s $依赖于模型本身预测的伪标签。由于Web标签或伪标签的正确性通常是每个Web样本的逐案基础上的,因此希望调整$ \ Mathcal {l} _s $和$ \ Mathcal {l} _W $之间的余额。受到深度神经网络(DNN)在信心预测中的能力的启发,我们通过适应WSL设置的模型不确定性引入了自给自足的信心(SCC),并使用它来示例平衡$ \ MATHCAL $ \ MATHCAL {l} _S $和$ \ natercal {l Mathcal {l} _W $。因此,提出了一个简单而有效的WSL框架。研究了一系列对SCC友好的正则化方法,其中提出的图形增强混合是提供高质量信心以增强我们的框架的最有效方法。拟议的WSL框架已在两个大规模WSL数据集(WebVision-1000和Food101-N)上取得了最新结果。代码可在https://github.com/bigvideoresearch/scc上找到。
This paper focuses on webly supervised learning (WSL), where datasets are built by crawling samples from the Internet and directly using search queries as web labels. Although WSL benefits from fast and low-cost data collection, noises in web labels hinder better performance of the image classification model. To alleviate this problem, in recent works, self-label supervised loss $\mathcal{L}_s$ is utilized together with webly supervised loss $\mathcal{L}_w$. $\mathcal{L}_s$ relies on pseudo labels predicted by the model itself. Since the correctness of the web label or pseudo label is usually on a case-by-case basis for each web sample, it is desirable to adjust the balance between $\mathcal{L}_s$ and $\mathcal{L}_w$ on sample level. Inspired by the ability of Deep Neural Networks (DNNs) in confidence prediction, we introduce Self-Contained Confidence (SCC) by adapting model uncertainty for WSL setting, and use it to sample-wisely balance $\mathcal{L}_s$ and $\mathcal{L}_w$. Therefore, a simple yet effective WSL framework is proposed. A series of SCC-friendly regularization approaches are investigated, among which the proposed graph-enhanced mixup is the most effective method to provide high-quality confidence to enhance our framework. The proposed WSL framework has achieved the state-of-the-art results on two large-scale WSL datasets, WebVision-1000 and Food101-N. Code is available at https://github.com/bigvideoresearch/SCC.