论文标题

句子分类任务的后校准培训

Posterior Calibrated Training on Sentence Classification Tasks

论文作者

Jung, Taehee, Kang, Dongyeop, Cheng, Hua, Mentch, Lucas, Schaaf, Thomas

论文摘要

大多数分类模型通过首先预测所有类的后验概率分布,然后以最大的估计概率选择该类。但是,在许多情况下,后验概率本身的质量(例如,有65%的糖尿病的机会)比单独预测的类别提供了更多可靠的信息。当这些方法显示出校准较差时,迄今为止的大多数修复都依赖于后校准,后者会重塑预测的概率,但通常对最终分类的影响很小。在这里,我们提出了一种称为后校准(POSCAL)训练的端到端训练程序,该程序直接优化了目标,同时最小化了预测和经验后验概率之间的差异。我们表明,POSCAL不仅有助于减少校准误差,还可以通过惩罚两个目标的绩效来改善任务绩效。与基线相比,我们的PoScal可实现约2.5%的任务绩效增长,胶水降低校准误差的16.1%(Wang等,2018)。我们在XSLUE上降低了13.2%的校准误差(Kang and Hovy,2019年),实现了可比的任务性能,但不能超过两阶段的校准基线。 POSCAL培训可以轻松地扩展到任何类型的分类任务,作为正规化项的一种形式。同样,Poscal具有一个优势,即它在训练过程中逐步跟踪校准目标所需的统计信息,从而有效利用大型训练集。

Most classification models work by first predicting a posterior probability distribution over all classes and then selecting that class with the largest estimated probability. In many settings however, the quality of posterior probability itself (e.g., 65% chance having diabetes), gives more reliable information than the final predicted class alone. When these methods are shown to be poorly calibrated, most fixes to date have relied on posterior calibration, which rescales the predicted probabilities but often has little impact on final classifications. Here we propose an end-to-end training procedure called posterior calibrated (PosCal) training that directly optimizes the objective while minimizing the difference between the predicted and empirical posterior probabilities.We show that PosCal not only helps reduce the calibration error but also improve task performance by penalizing drops in performance of both objectives. Our PosCal achieves about 2.5% of task performance gain and 16.1% of calibration error reduction on GLUE (Wang et al., 2018) compared to the baseline. We achieved the comparable task performance with 13.2% calibration error reduction on xSLUE (Kang and Hovy, 2019), but not outperforming the two-stage calibration baseline. PosCal training can be easily extendable to any types of classification tasks as a form of regularization term. Also, PosCal has the advantage that it incrementally tracks needed statistics for the calibration objective during the training process, making efficient use of large training sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源