论文标题
在具有平方和对数损失的生存模型的外部验证中检查边际术
Examining marginal properness in the external validation of survival models with squared and logarithmic losses
论文作者
论文摘要
评分规则促进了理性和诚实的决策,这对于模型评估至关重要,对于诸如“ Automl”之类的自动化程序变得越来越重要。在本文中,我们调查了共同的平方和对数评分规则,以进行生存分析,重点是他们的理论和经验性。我们介绍了适当性的边际定义,并表明综合生存的Brier评分(ISB)和右审查的对数可能性(RCLL)在理论上在此定义上都是不当的。我们还调查了一类新的损失,这些损失可能会为未来的生存评分规则提供依据。仿真实验表明,ISB和RCLL在实践中都表现为适当的评分规则。 RCLL在所有情况下均未显示出违规行为,而ISB仅显示出极小的样本量的较小违规行为,这表明人们可以信任历史实验的结果。因此,我们在模型的外部验证中提倡RCLL和ISB,包括在自动化过程中。但是,我们注意到估计这些损失的实践挑战,包括估计审查分布和密度;由于需要进一步的研究来推动生存分析中强大而诚实的评估的发展。
Scoring rules promote rational and honest decision-making, which is important for model evaluation and becoming increasingly important for automated procedures such as `AutoML'. In this paper we survey common squared and logarithmic scoring rules for survival analysis, with a focus on their theoretical and empirical properness. We introduce a marginal definition of properness and show that both the Integrated Survival Brier Score (ISBS) and the Right-Censored Log-Likelihood (RCLL) are theoretically improper under this definition. We also investigate a new class of losses that may inform future survival scoring rules. Simulation experiments reveal that both the ISBS and RCLL behave as proper scoring rules in practice. The RCLL showed no violations across all settings, while ISBS exhibited only minor, negligible violations at extremely small sample sizes, suggesting one can trust results from historical experiments. As such we advocate for both the RCLL and ISBS in external validation of models, including in automated procedures. However, we note practical challenges in estimating these losses including estimation of censoring distributions and densities; as such further research is required to advance development of robust and honest evaluation in survival analysis.