弱监督学习的数据一致性

论文标题

弱监督学习的数据一致性

Data Consistency for Weakly Supervised Learning

论文作者

Arachie, Chidubem, Huang, Bert

论文摘要

在许多应用中，培训机学习模型涉及使用大量的人类注销数据。获得数据的精确标签很昂贵。取而代之的是，对监督较弱的培训提供了低成本的替代方案。我们提出了一种新型的弱监督算法，该算法处理嘈杂的标签，即弱信号，同时还考虑训练数据的特征以产生准确的训练标签。我们的方法搜索数据表示的分类器，以找到合理的标签。我们将此范式数据称为一致的弱监督。我们框架的一个关键方面是，我们能够估算出弱监督范围或没有覆盖范围的数据示例的标签。此外，我们对弱信号的联合分布和数据的真实标签没有任何假设。取而代之的是，我们使用弱信号和数据功能来求解受约束的优化，从而在我们生成的标签之间实现数据一致性。在不同数据集上对我们方法的经验评估表明，在文本和图像分类任务上，它的表现明显优于最先进的监督方法。

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals, while also considering features of the training data to produce accurate labels for training. Our method searches over classifiers of the data representation to find plausible labelings. We call this paradigm data consistent weak supervision. A key facet of our framework is that we are able to estimate labels for data examples low or no coverage from the weak supervision. In addition, we make no assumptions about the joint distribution of the weak signals and true labels of the data. Instead, we use weak signals and the data features to solve a constrained optimization that enforces data consistency among the labels we generate. Empirical evaluation of our method on different datasets shows that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题