论文标题
$ l^2 $ - 增强高维线性模型的早期停止
Early stopping for $ L^2 $-boosting in high-dimensional linear models
论文作者
论文摘要
越来越高的数据集要求估计方法不仅满足统计保证,而且在计算上仍然可行。在这种情况下,我们考虑$ l^{2} $ - 通过高维线性模型通过正交匹配追踪来促进,并分析了算法的数据驱动的早期停止时间$τ$,这是顺序的,因为其计算仅基于第一个$τ$ tererations。这种方法的成本要比建立的模型选择标准要低得多,该标准需要计算完整的增强路径。我们证明,在这种情况下,在这种环境中,依次的早期停止在经验风险完全普遍的不平等方面可以保存统计最佳,并且最近确定了人口风险的最佳收敛率。最后,一项广泛的仿真研究表明,以大量降低的计算成本,这些类型的方法的性能与其他最先进的算法相当,例如跨验证的套索或模型选择,通过高维Akaike标准基于完整的助推路径。
Increasingly high-dimensional data sets require that estimation methods do not only satisfy statistical guarantees but also remain computationally feasible. In this context, we consider $ L^{2} $-boosting via orthogonal matching pursuit in a high-dimensional linear model and analyze a data-driven early stopping time $ τ$ of the algorithm, which is sequential in the sense that its computation is based on the first $ τ$ iterations only. This approach is much less costly than established model selection criteria, that require the computation of the full boosting path. We prove that sequential early stopping preserves statistical optimality in this setting in terms of a fully general oracle inequality for the empirical risk and recently established optimal convergence rates for the population risk. Finally, an extensive simulation study shows that at an immensely reduced computational cost, the performance of these type of methods is on par with other state of the art algorithms such as the cross-validated Lasso or model selection via a high dimensional Akaike criterion based on the full boosting path.