论文标题

改进的保证和用于列子集选择和NyStröm方法的多种曲线

Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nyström method

论文作者

Dereziński, Michał, Khanna, Rajiv, Mahoney, Michael W.

论文摘要

列子集选择问题(CSSP)和NyStröm方法是在机器学习和科学计算中构建大型数据集的小近似值的主要工具。该领域的一个基本问题是:k的数据子集与最佳等级k近似竞争如何?我们开发了利用数据矩阵的光谱特性以获得改进的近似保证的技术,这超出了标准的最差分析。我们的方法会导致具有已知奇异值衰减速率(例如多项式衰减)的数据集的明显更好的界限。我们的分析还揭示了一种有趣的现象:K的近似因子可能显示出多个峰值和山谷,我们称之为多潜度曲线。我们确定的下限表明,这种行为不是我们分析的伪像,而是CSSP和NyStröm任务的固有属性。最后,使用径向基函数(RBF)内核的示例,我们表明我们的改进边界和多种曲线都可以通过更改RBF参数来观察到实际数据集。

The Column Subset Selection Problem (CSSP) and the Nyström method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the approximation factor as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nyström tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源