论文标题
预先训练的变压器的校准
Calibration of Pre-trained Transformers
论文作者
论文摘要
预先训练的变压器现在无处不在自然语言处理,但是尽管其最终任务的性能很高,但在经验上,对它们是否经过校准知之甚少。具体来说,这些模型的后验概率是否提供了在给定示例中正确正确的经验度量的准确经验度量?我们在这项工作中专注于伯特和罗伯塔,并分析了他们跨三个任务的校准:自然推理,释义检测和常识性推理。对于每个任务,我们都会考虑内域以及具有挑战性的室外设置,模型面临更多示例,应该不确定。我们表明:(1)当使用开箱即用时,对预训练的模型进行校准,并且与基线相比,其校准误差外域的校准误差可能会降低3.5倍; (2)温度缩放在进一步降低校准误差方面是有效的,并使用标签平滑来故意增加经验不确定性有助于校准后域外。
Pre-trained Transformers are now ubiquitous in natural language processing, but despite their high end-task performance, little is known empirically about whether they are calibrated. Specifically, do these models' posterior probabilities provide an accurate empirical measure of how likely the model is to be correct on a given example? We focus on BERT and RoBERTa in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning. For each task, we consider in-domain as well as challenging out-of-domain settings, where models face more examples they should be uncertain about. We show that: (1) when used out-of-the-box, pre-trained models are calibrated in-domain, and compared to baselines, their calibration error out-of-domain can be as much as 3.5x lower; (2) temperature scaling is effective at further reducing calibration error in-domain, and using label smoothing to deliberately increase empirical uncertainty helps calibrate posteriors out-of-domain.