论文标题
使用合并的生存分析和深度学习方法从国家行政数据库中预测心血管风险
Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approach
论文作者
论文摘要
目标。这项研究将生存分析模型的深度学习扩展与传统的COX比例危害(CPH)模型进行了比较,以导致国家卫生行政数据集中的心血管疾病(CVD)风险预测方程。方法。我们使用多个行政数据集的个人联系,我们建立了所有30-74岁的新西兰居民的队列,他们在2012年与公共资助的卫生服务互动,并在五年的随访中确定了CVD的住院和死亡。在排除了先前CVD或心力衰竭的人之后,开发了针对性的深度学习和CPH模型,以估算五年内致命或非致命CVD事件的风险。比较了整个研究人群和特定风险组的模型之间的解释时间,校准和歧视的比例。发现。第一个CVD事件发生在2,164,872人中的61,927人中。在诊断和程序中,最大的“局部”危害比率与女性使用烟草使用的深度学习模型相关联(2.04,95%CI:1.99-2.10),男性急性下呼吸道感染(1.56,95%CI:1.50-1.50-1.62)与慢性阻塞性肺部疾病有关。与当前有关CVD风险预测因子的当前知识对齐的其他已鉴定的预测因子(例如高血压,胸痛,糖尿病)。深度学习模型在说明的事件发生时间的比例(Royston and Sauerbrei的R-squared:0.468 vs. 0.425中,女性为0.383 vs. 0.348 vs. 0.348),男性校准和歧视(所有p <0.0001)。解释。生存分析模型的深度学习扩展可以应用于大型健康管理数据库,以得出比传统CPH模型更准确的可解释的CVD风险预测方程。
AIMS. This study compared the performance of deep learning extensions of survival analysis models with traditional Cox proportional hazards (CPH) models for deriving cardiovascular disease (CVD) risk prediction equations in national health administrative datasets. METHODS. Using individual person linkage of multiple administrative datasets, we constructed a cohort of all New Zealand residents aged 30-74 years who interacted with publicly funded health services during 2012, and identified hospitalisations and deaths from CVD over five years of follow-up. After excluding people with prior CVD or heart failure, sex-specific deep learning and CPH models were developed to estimate the risk of fatal or non-fatal CVD events within five years. The proportion of explained time-to-event occurrence, calibration, and discrimination were compared between models across the whole study population and in specific risk groups. FINDINGS. First CVD events occurred in 61,927 of 2,164,872 people. Among diagnoses and procedures, the largest 'local' hazard ratios were associated by the deep learning models with tobacco use in women (2.04, 95%CI: 1.99-2.10) and with chronic obstructive pulmonary disease with acute lower respiratory infection in men (1.56, 95%CI: 1.50-1.62). Other identified predictors (e.g. hypertension, chest pain, diabetes) aligned with current knowledge about CVD risk predictors. The deep learning models significantly outperformed the CPH models on the basis of proportion of explained time-to-event occurrence (Royston and Sauerbrei's R-squared: 0.468 vs. 0.425 in women and 0.383 vs. 0.348 in men), calibration, and discrimination (all p<0.0001). INTERPRETATION. Deep learning extensions of survival analysis models can be applied to large health administrative databases to derive interpretable CVD risk prediction equations that are more accurate than traditional CPH models.