多学习的提升：合并与结合的理论考虑因素

论文标题

多学习的提升：合并与结合的理论考虑因素

Multi-Study Boosting: Theoretical Considerations for Merging vs. Ensembling

论文作者

Shyr, Cathy, Sur, Pragya, Parmigiani, Giovanni, Patil, Prasad

论文摘要

跨研究的可复制性是强大的模型评估标准，强调预测的普遍性。当培训跨研究的可复制预测模型时，至关重要的是分别合并和处理研究。我们研究了在跨研究的预测结果之间存在潜在异质性的情况下在存在潜在异质性的情况下增强算法的，并比较了两个多学生学习策略：1）合并所有研究并培训单个模型，2）多学生结合，这涉及在每个研究上训练一个单独的模型，并在每个研究上进行培训并完成结果预测。在回归环境中，我们根据分析过渡点提供理论指南，以确定合并更有益或与线性学习者提升更有益。此外，我们表征了通过组件线性学习者提高估计误差的偏差差异分解。我们验证理论过渡点会导致模拟，并说明它如何指导合并与乳腺癌基因表达数据应用中的结合的决定。

Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relationships across studies and compare two multi-study learning strategies: 1) merging all the studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and ensembling the resulting predictions. In the regression setting, we provide theoretical guidelines based on an analytical transition point to determine whether it is more beneficial to merge or to ensemble for boosting with linear learners. In addition, we characterize a bias-variance decomposition of estimation error for boosting with component-wise linear learners. We verify the theoretical transition point result in simulation and illustrate how it can guide the decision on merging vs. ensembling in an application to breast cancer gene expression data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题