论文标题
高维PCA:一个新的模型选择标准
High dimensional PCA: a new model selection criterion
论文作者
论文摘要
鉴于来自多元人群的随机样本,估计人口协方差矩阵的大量特征值的数量是许多领域应用广泛应用的重要问题。在主成分分析(PCA)的背景下,具有最大变化量的原始变量的线性组合由此数字确定。在本文中,我们研究了高维渐近方制度,其中变量的数量以与观测数量相同的速率增长,并使用Johnstone(2001)中提出的加标协方差模型,根据该模型,该问题将问题减少到模型选择。我们的重点是Akaike信息标准(AIC),这与Bai等人的工作非常一致。 (2018)。但是,Bai等。 (2018年)需要一定的“间隙状况”,以确保严格的特征值严格大于BBP阈值(Baik等人(2005年)),这两个数量取决于变量和观察的限制比率。 AIC的一致性需要一些额外的信号强度。 在本文中,我们调查了一致性是否继续保持,即使“差距”变小。我们表明,如果我们根据目标差距适当地更改AIC的罚款项,则可以实现任意差距的强大一致性。此外,尽管在这种情况下我们只能达到弱的一致性,但惩罚的另一个直觉改变确实可以使差距完全零。我们通过广泛的模拟研究将两个新提出的估计量与文献中现有的估计量进行了比较,并通过对我们的建议进行了适当校准,表明可以实现于点误差的重大改进。
Given a random sample from a multivariate population, estimating the number of large eigenvalues of the population covariance matrix is an important problem in Statistics with wide applications in many areas. In the context of Principal Component Analysis (PCA), the linear combinations of the original variables having the largest amounts of variation are determined by this number. In this paper, we study the high dimensional asymptotic regime where the number of variables grows at the same rate as the number of observations, and use the spiked covariance model proposed in Johnstone (2001), under which the problem reduces to model selection. Our focus is on the Akaike Information Criterion (AIC) which is known to be strongly consistent from the work of Bai et al. (2018). However, Bai et al. (2018) requires a certain "gap condition" ensuring the dominant eigenvalues to be above a threshold strictly larger than the BBP threshold (Baik et al. (2005), both quantities depending on the limiting ratio of the number of variables and observations. It is well-known that, below the BBP threshold, a spiked covariance structure becomes indistinguishable from one with no spikes. Thus the strong consistency of AIC requires some extra signal strength. In this paper, we investigate whether consistency continues to hold even if the "gap" is made smaller. We show that strong consistency under arbitrarily small gap is achievable if we alter the penalty term of AIC suitably depending on the target gap. Furthermore, another intuitive alteration of the penalty can indeed make the gap exactly zero, although we can only achieve weak consistency in this case. We compare the two newly-proposed estimators with other existing estimators in the literature via extensive simulation studies, and show, by suitably calibrating our proposals, that a significant improvement in terms of mean-squared error is achievable.