论文标题
通过使用预训练的语言模型来检测标签错误
Detecting Label Errors by using Pre-Trained Language Models
论文作者
论文摘要
我们表明,大型的预训练的语言模型本质上具有高度能力识别自然语言数据集中的标签错误:简单地检查样本外数据点以微调任务损失的降序降序明显优于先前工作中提出的更复杂的错误检测机制。 为此,我们贡献了一种新颖的方法,可以将现实的,人类原始的标签噪声引入现有的众包数据集,例如SNLI和TweetNLP。我们表明,这种噪声与真实的手工验证的标签错误具有相似的属性,并且比现有的合成噪声更难检测,这给模型鲁棒性带来了挑战。我们认为,与合成噪声相比,以人为生的噪声是评估的更好标准。 最后,我们使用众包验证来评估IMDB,亚马逊评论和侦察的实际错误的检测,并确认在Precision-Reclall曲线下,预训练的模型的绝对面积比现有模型高9-36%。
We show that large pre-trained language models are inherently highly capable of identifying label errors in natural language datasets: simply examining out-of-sample data points in descending order of fine-tuned task loss significantly outperforms more complex error-detection mechanisms proposed in previous work. To this end, we contribute a novel method for introducing realistic, human-originated label noise into existing crowdsourced datasets such as SNLI and TweetNLP. We show that this noise has similar properties to real, hand-verified label errors, and is harder to detect than existing synthetic noise, creating challenges for model robustness. We argue that human-originated noise is a better standard for evaluation than synthetic noise. Finally, we use crowdsourced verification to evaluate the detection of real errors on IMDB, Amazon Reviews, and Recon, and confirm that pre-trained models perform at a 9-36% higher absolute Area Under the Precision-Recall Curve than existing models.