论文标题
大数据中分位数回归的最佳子采样
Optimal subsampling for quantile regression in big data
论文作者
论文摘要
我们研究了分位数回归的最佳子采样。我们得出了一般亚采样估计器的渐近分布,然后得出了两个最佳亚采样概率的版本。一个版本最小化了线性转换的参数估计器的渐近方差 - 互动矩阵的痕迹,而另一个则最小化原始参数估计器的痕迹。前者不取决于给定的协变量的响应的密度,并且易于实施。提出了基于最佳亚采样概率的算法,并建立了所得估计量的渐近分布和渐近最优性。此外,我们基于线性转换的参数估计中的最佳子采样概率提出了一个迭代子采样过程,该过程具有很大的可扩展性来利用可用的计算资源。此外,此过程为参数估计器产生标准误差,而无需估计给定协变量的响应密度。我们提供基于模拟和实际数据的数值示例,以说明所提出的方法。
We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator and the other minimizes that of the original parameter estimator. The former does not depend on the densities of the responses given covariates and is easy to implement. Algorithms based on optimal subsampling probabilities are proposed and asymptotic distributions and asymptotic optimality of the resulting estimators are established. Furthermore, we propose an iterative subsampling procedure based on the optimal subsampling probabilities in the linearly transformed parameter estimation which has great scalability to utilize available computational resources. In addition, this procedure yields standard errors for parameter estimators without estimating the densities of the responses given the covariates. We provide numerical examples based on both simulated and real data to illustrate the proposed method.