在多变量临床预测模型开发中基于自举的乐观校正方法的比较有效性的重新评估

论文标题

在多变量临床预测模型开发中基于自举的乐观校正方法的比较有效性的重新评估

Re-evaluation of the comparative effectiveness of bootstrap-based optimism correction methods in the development of multivariable clinical prediction models

论文作者

Iba, Katsuhiro, Shinozaki, Tomohiro, Maruo, Kazushi, Noma, Hisashi

论文摘要

多变量预测模型是基于多个患者特征提供合成诊断和预后算法的重要统计工具。相对于外部人群的实际表现，它们的明显判别和校准度量通常具有高估的偏见（称为“乐观”）。现有的统计证据和指南表明，在实践中，即哈雷尔的偏见校正以及.632和.632+估计器，在实践中最好使用三种基于自举的偏置校正方法。尽管Harrell的方法在临床研究中已被广泛采用，但基于模拟的证据表明，.632+估计器的性能可能比其他两种方法更好。但是，有限的证据，这些方法的实际比较效率仍不清楚。在本文中，我们进行了广泛的模拟，以比较这些方法的有效性，尤其是使用以下现代回归模型：常规逻辑回归，逐步变量选择，Firth的惩罚可能性方法，Ridge，Lasso和Elastic-Net。在相对较大的样本设置下，基于自举的三种方法可比较且性能很好。但是，所有三种方法在小样本设置下都有偏见，并且偏见的方向和大小不一致。通常，建议使用.632+估计器，但是我们提供了几种有关每种方法的操作特征的注释。

Multivariable predictive models are important statistical tools for providing synthetic diagnosis and prognostic algorithms based on multiple patients' characteristics. Their apparent discriminant and calibration measures usually have overestimation biases (known as 'optimism') relative to the actual performances for external populations. Existing statistical evidence and guidelines suggest that three bootstrap-based bias correction methods are preferable in practice, namely Harrell's bias correction and the .632 and .632+ estimators. Although Harrell's method has been widely adopted in clinical studies, simulation-based evidence indicates that the .632+ estimator may perform better than the other two methods. However, there is limited evidence and these methods' actual comparative effectiveness is still unclear. In this article, we conducted extensive simulations to compare the effectiveness of these methods, particularly using the following modern regression models: conventional logistic regression, stepwise variable selections, Firth's penalized likelihood method, ridge, lasso, and elastic-net. Under relatively large sample settings, the three bootstrap-based methods were comparable and performed well. However, all three methods had biases under small sample settings, and the directions and sizes of the biases were inconsistent. In general, the .632+ estimator is recommended, but we provide several notes concerning the operating characteristics of each method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题