论文标题

用于计算最小修剪正方形估计器的新算法

New algorithms for computing the least trimmed squares estimator

论文作者

Zuo, Yijun

论文摘要

Rousseeuw(1984)提议将所有$ n $平方残差的总和最小化,而是提议最小化$ h $($ n/2 \ leq h <n $)的总和最小平方的总和最小的估计器,而是将最小的估计值称为“最小调整squares”(lts)。 LTS的想法很简单,但是其计算具有挑战性,因为不再存在LS型分析计算公式。自存在以来,可行的解决方案算法(Hawkins(1994)),Fastlts.F(​​Rousseeuw和van Driesen(1999))和Fast-LTS(Rousseeuw和van Driessen(2006)),正在实现近似算法。后两个已通过瓦伦丁·托多罗夫(Valentin Todorov)纳入R函数LTSREG。这些算法利用组合方法或子采样方法。凭借出色的软件可访问性和快速速度,LTS享有许多理想的属性,已成为多个学科中最受欢迎的强大回归估计器之一。本文提出了分析方法 - 采用目标函数的一阶导数(梯度)和二阶导数(Hessian矩阵)。我们针对LTS的近似算法在合成和真实数据示例中进行了审查。与LTSREG相比 - 在稳健回归中的基准和以其速度而闻名的基准,我们的算法在速度和准确性标准方面都是可比(有时甚至有利的)。其他主要贡献包括(i)分别在经验和人口环境中发起了独特性以及强大和渔民的一致性; (ii)在一般环境中得出影响函数; (iii)以整洁而通用的方法重新建立估计量的渐近正态性(因此是根N的一致性)。

Instead of minimizing the sum of all $n$ squared residuals as the classical least squares (LS) does, Rousseeuw (1984) proposed to minimize the sum of $h$ ($n/2 \leq h < n$) smallest squared residuals, the resulting estimator is called least trimmed squares (LTS). The idea of the LTS is simple but its computation is challenging since no LS-type analytical computation formula exists anymore. Attempts had been made since its presence, the feasible solution algorithm (Hawkins (1994)), fastlts.f (Rousseeuw and Van Driessen (1999)), and FAST-LTS (Rousseeuw and Van Driessen (2006)), among others, are promising approximate algorithms. The latter two have been incorporated into R function ltsReg by Valentin Todorov. These algorithms utilize combinatorial- or subsampling- approaches. With the great software accessibility and fast speed, the LTS, enjoying many desired properties, has become one of the most popular robust regression estimators across multiple disciplines. This article proposes analytic approaches -- employing first-order derivative (gradient) and second-order derivative (Hessian matrix) of the objective function. Our approximate algorithms for the LTS are vetted in synthetic and real data examples. Compared with ltsReg -- the benchmark in robust regression and well-known for its speed, our algorithms are comparable (and sometimes even favorable) with respect to both speed and accuracy criteria. Other major contributions include (i) originating the uniqueness and the strong and Fisher consistency at empirical and population settings respectively; (ii) deriving the influence function in a general setting; (iii) re-establishing the asymptotic normality (consequently root-n consistency) of the estimator with a neat and general approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源