论文标题
生成主成分分析
Generative Principal Component Analysis
论文作者
论文摘要
在本文中,我们研究了主要成分分析的问题,并采用了生成建模假设,采用了一个普通矩阵的通用模型,该矩阵涵盖了包括尖峰矩阵恢复和相位检索在内的明显特殊情况。关键假设是,基础信号位于$ l $ -Lipschitz连续生成模型的范围内,该模型具有有限的$ k $二维输入。我们提出了一个二次估计器,并表明它享有顺序的统计率$ \ sqrt {\ frac {k \ log l} {m} {m}} $,其中$ m $是样本的数量。我们还提供了近乎匹配的算法独立的下限。此外,我们提供了经典功率方法的一种变体,该方法将计算的数据投射到每次迭代期间生成模型的范围内。我们表明,在适当的条件下,此方法将指数级的快速收敛到达到上述统计率的点。我们在各种图像数据集上对峰值矩阵和相位检索模型进行实验,并将方法的性能提高到经典功率方法以及为稀疏主组件分析而设计的截断功率方法。
In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.