论文标题

仅使用积极培训数据进行量化

Quantifying With Only Positive Training Data

论文作者

Reis, Denis dos, de Souto, Marcílio, de Sousa, Elaine, Batista, Gustavo

论文摘要

量化是研究未标记样本中属于每个类别的数据点数量的方法的研究领域。传统上,该领域的研究人员假设所有类别都可以诱导定量模型的标记观测值。但是,我们经常面临类的情况,即类的数量很大,甚至是未知的,或者我们拥有一个单一类的可靠数据。当诱导多类量化器是不可行的时,我们通常会关注特定兴趣类别的估计。在这种情况下,我们提出了一种新颖的环境,称为单级定量(OCQ)。相比之下,尽管量化不是PUL的焦点,但机器学习的另一个分支正积极和未标记的学习(PUL)提供了OCQ的解决方案。本文缩小了PUL和OCQ之间的差距,并在统一的视图下将两个区域汇集在一起​​。我们将我们的方法,被动攻击阈值(PAT)与PUL方法进行比较,并表明PAT通常是最快,最准确的算法。 PAT诱导了可以重复使用以量化不同数据样本的定量模型。我们还引入了详尽的TICE(Extice),这是C估计(TICE)的PUL算法诱导的改进版本。我们表明,扩展比PAT更准确地量化了PAT,并且在几个负面观察结果与正相同的情况下,其他评估的算法量化。

Quantification is the research field that studies methods for counting the number of data points that belong to each class in an unlabeled sample. Traditionally, researchers in this field assume the availability of labelled observations for all classes to induce a quantification model. However, we often face situations where the number of classes is large or even unknown, or we have reliable data for a single class. When inducing a multi-class quantifier is infeasible, we are often concerned with estimates for a specific class of interest. In this context, we have proposed a novel setting known as One-class Quantification (OCQ). In contrast, Positive and Unlabeled Learning (PUL), another branch of Machine Learning, has offered solutions to OCQ, despite quantification not being the focal point of PUL. This article closes the gap between PUL and OCQ and brings both areas together under a unified view. We compare our method, Passive Aggressive Threshold (PAT), against PUL methods and show that PAT generally is the fastest and most accurate algorithm. PAT induces quantification models that can be reused to quantify different samples of data. We additionally introduce Exhaustive TIcE (ExTIcE), an improved version of the PUL algorithm Tree Induction for c Estimation (TIcE). We show that ExTIcE quantifies more accurately than PAT and the other assessed algorithms in scenarios where several negative observations are identical to the positive ones.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源