论文标题
隐藏异质性:何时选择基于相似性的校准
Hidden Heterogeneity: When to Choose Similarity-Based Calibration
论文作者
论文摘要
值得信赖的分类器对于在许多现实世界中采用机器学习预测至关重要。可能的结果的预测可能性可以为高风险的决策做出介绍,尤其是在评估替代决策的预期价值或不良结果的风险时。这些决定需要良好的概率,而不仅仅是对最可能类的正确预测。黑盒分类器校准方法可以提高分类器输出的可靠性,而无需重新培训。但是,这些方法无法检测校准也可以提高预测准确性的亚群。据说这种亚群显示出“隐藏的异质性”(HH),因为原始分类器未检测到它们。本文提出了针对HH的定量度量。它还引入了两种相似加权的校准方法,可以通过在本地调整每个测试项目来解决HH:SWC通过与测试项目相似的校准,而SWC-HH明确合并了隐藏的异质性以滤除校准集。实验表明,通过基于相似性的校准方法实现的校准的改进与当前的HH量相关,并且给定足够的校准数据,通常超过了通过全局方法实现的校准。因此,HH可以用作识别何时有益的局部校准方法的有用诊断工具。
Trustworthy classifiers are essential to the adoption of machine learning predictions in many real-world settings. The predicted probability of possible outcomes can inform high-stakes decision making, particularly when assessing the expected value of alternative decisions or the risk of bad outcomes. These decisions require well-calibrated probabilities, not just the correct prediction of the most likely class. Black-box classifier calibration methods can improve the reliability of a classifier's output without requiring retraining. However, these methods are unable to detect subpopulations where calibration could also improve prediction accuracy. Such subpopulations are said to exhibit "hidden heterogeneity" (HH), because the original classifier did not detect them. This paper proposes a quantitative measure for HH. It also introduces two similarity-weighted calibration methods that can address HH by adapting locally to each test item: SWC weights the calibration set by similarity to the test item, and SWC-HH explicitly incorporates hidden heterogeneity to filter the calibration set. Experiments show that the improvements in calibration achieved by similarity-based calibration methods correlate with the amount of HH present and, given sufficient calibration data, generally exceed calibration achieved by global methods. HH can therefore serve as a useful diagnostic tool for identifying when local calibration methods would be beneficial.