论文标题
稀疏PCA具有多个组件
Sparse PCA With Multiple Components
论文作者
论文摘要
稀疏主成分分析(SPCA)是一种基本技术,用于获得特征或主成分(PC)的组合,以可解释的方式解释高维数据集的方差。这涉及解决稀疏性和正交性限制的最大化问题,这在计算上极具挑战性。大多数现有的作品通过方法类似地解决稀疏的PCA,例如迭代计算一个稀疏的PC并为协方差矩阵放气 - 当我们寻求多个相互正交的PC时,不能保证正交性(更不用说最佳)的正交性。我们通过重新调整正交条件为等级约束并对稀疏性进行优化和等级约束来挑战这种状态。我们设计了紧密的半决赛松弛,以提供高质量的上边界,当指定每个PC的个体稀疏性时,我们会通过其他二阶锥体不等式来加强。此外,我们在最大变异量上得出了组合上限,该方差被解释为支持的函数。我们利用这些放松和边界提出了精确的方法和圆形机制,这些方法与p = 100s或1000个功能的现实数据集和{2,3}组件中的r \ in {2,3}组件共同获得了固定差的解决方案。从数值上讲,我们的算法在解释的方差部分和系统返回稀疏且正交的PC上匹配(有时超过)最佳性能方法。相比之下,我们发现即使根据稀疏正交PC生成数据,违反正交性约束的通缩返回解决方案等现有方法也是如此。总的来说,我们的方法解决了稀疏的PCA问题,其多个组件以实际的易处理方式来确认(接近)最佳。
Sparse Principal Component Analysis (sPCA) is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. This involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing works address sparse PCA via methods-such as iteratively computing one sparse PC and deflating the covariance matrix-that do not guarantee the orthogonality, let alone the optimality, of the resulting solution when we seek multiple mutually orthogonal PCs. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations to supply high-quality upper bounds, which we strengthen via additional second-order cone inequalities when each PC's individual sparsity is specified. Further, we derive a combinatorial upper bound on the maximum amount of variance explained as a function of the support. We exploit these relaxations and bounds to propose exact methods and rounding mechanisms that, together, obtain solutions with a bound gap on the order of 0%-15% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. Numerically, our algorithms match (and sometimes surpass) the best performing methods in terms of fraction of variance explained and systematically return PCs that are sparse and orthogonal. In contrast, we find that existing methods like deflation return solutions that violate the orthogonality constraints, even when the data is generated according to sparse orthogonal PCs. Altogether, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.