论文标题
具有缺失标签的多标签分类的有效方法
An Effective Approach for Multi-label Classification with Missing Labels
论文作者
论文摘要
与多类分类相比,包含多个类别的多标签分类更适合现实生活中。但是,对于注释工作,获得完全标记的高质量数据集来解决多标签分类问题,甚至是不可行的,有时甚至是不可行的,尤其是当标签空间太大时。这激发了对部分标签分类的研究,其中只有有限的标签被注释,而其他标签则缺失。为了解决这个问题,我们首先提出了一种基于伪标签的方法,以降低注释成本,而无需为现有的分类网络带来额外的复杂性。然后,我们定量研究缺失标签对分类器性能的影响。此外,通过设计新型损失功能,我们能够放宽每个实例必须包含至少一个正标签的要求,这是在大多数现有方法中通常使用的。通过对三个大尺度多标签图像数据集进行的全面实验,即MS-Coco,Nus Wide和Pascal VOC12,我们表明我们的方法可以处理正面标签和负标签之间的不平衡,同时在大多数情况下仍然超过现有的缺少标签学习方法,在某些情况下,甚至在某些情况下甚至可以使用完全标记的数据集。
Compared with multi-class classification, multi-label classification that contains more than one class is more suitable in real life scenarios. Obtaining fully labeled high-quality datasets for multi-label classification problems, however, is extremely expensive, and sometimes even infeasible, with respect to annotation efforts, especially when the label spaces are too large. This motivates the research on partial-label classification, where only a limited number of labels are annotated and the others are missing. To address this problem, we first propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the existing classification networks. Then we quantitatively study the impact of missing labels on the performance of classifier. Furthermore, by designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label, which is commonly used in most existing approaches. Through comprehensive experiments on three large-scale multi-label image datasets, i.e. MS-COCO, NUS-WIDE, and Pascal VOC12, we show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches in most cases, and in some cases even approaches with fully labeled datasets.