论文标题

通过最佳子采样近似部分估计器

Approximating Partial Likelihood Estimators via Optimal Subsampling

论文作者

Zhang, Haixiang, Zuo, Lulu, Wang, HaiYing, Sun, Liuquan

论文摘要

随着大规模生物医学数据的日益增长的可用性,直接执行传统的统计分析,而手头相对有限的计算资源直接进行传统的统计分析通常是耗时或不可行的。我们提出了一种快速的亚采样方法,以有效地近似Cox模型中的最大数据最大偏差估计器,该估计量在分析大量生存数据时大大减轻了计算负担。我们建立了基于一般子样本估计量的一致性和渐近正态性。具有显式表达式的最佳亚采样概率是通过最小化线性转换参数估计器的渐近方差互动矩阵的痕迹来确定的。我们提出了一种用于实际实施的两步子采样算法,与完整的数据方法相比,该算法的计算时间大大减少。还建立了由此产生的两步子样本估计量的渐近特性。提供了广泛的数值实验和现实世界的示例来评估我们的亚采样策略。

With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling method to effectively approximate the full data maximum partial likelihood estimator in Cox's model, which largely reduces the computational burden when analyzing massive survival data. We establish consistency and asymptotic normality of a general subsample-based estimator. The optimal subsampling probabilities with explicit expressions are determined via minimizing the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator. We propose a two-step subsampling algorithm for practical implementation, which has a significant reduction in computing time compared to the full data method. The asymptotic properties of the resulting two-step subsample-based estimator is also established. Extensive numerical experiments and a real-world example are provided to assess our subsampling strategy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源