低级分布的线性样本学习

论文标题

低级分布的线性样本学习

Linear-Sample Learning of Low-Rank Distributions

论文作者

Jain, Ayush, Orlitsky, Alon

论文摘要

许多潜在变量应用程序，包括社区检测，协作过滤，基因组分析和NLP，这些模型数据是由低级矩阵生成的。然而，尽管进行了大量研究，但除了非常特殊的情况外，尚不知道有效恢复基础矩阵所需的样品数量。我们确定在几种常见的潜在可变性设置中学习的开始。对于所有人，我们表明学习$ k \ times k $，等级 - $ r $，标准化$ l_ {1} $ l_ {1} $ distai rε）$样品，在高维度中的数字线性，几乎是线性的，通常为低等级。该算法改进了现有的光谱技术，并在多项式时间内运行。证明为模型和观察矩阵之间光谱距离的快速收敛而建立了新的结果，并且可能具有独立的关注。

Many latent-variable applications, including community detection, collaborative filtering, genomic analysis, and NLP, model data as generated by low-rank matrices. Yet despite considerable research, except for very special cases, the number of samples required to efficiently recover the underlying matrices has not been known. We determine the onset of learning in several common latent-variable settings. For all of them, we show that learning $k\times k$, rank-$r$, matrices to normalized $L_{1}$ distance $ε$ requires $Ω(\frac{kr}{ε^2})$ samples, and propose an algorithm that uses ${\cal O}(\frac{kr}{ε^2}\log^2\frac rε)$ samples, a number linear in the high dimension, and nearly linear in the, typically low, rank. The algorithm improves on existing spectral techniques and runs in polynomial time. The proofs establish new results on the rapid convergence of the spectral distance between the model and observation matrices, and may be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题