论文标题
基于特征形状的nystrom取样的光谱聚类
Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling
论文作者
论文摘要
光谱聚类在分析簇结构方面表现出了出色的性能。但是,其计算复杂性限制了其在分析大规模数据中的应用。为了解决此问题,提出了许多低级矩阵近似算法,包括Nystrom方法 - 一种具有可靠的近似误差界限的方法。有几种算法可以提供食谱来构建具有可变精度和计算时间的尼斯特型近似值。本文提出了一种可扩展的基于NyStrom的聚类算法,具有新的采样过程,质心最小平方相似性总和(CMS3)以及何时使用它的启发式。我们的启发式依赖于数据集的本本频谱形状,并且与其他最先进的方法相比,测试数据集中的竞争性低级别近似值
Spectral clustering has shown a superior performance in analyzing the cluster structure. However, its computational complexity limits its application in analyzing large-scale data. To address this problem, many low-rank matrix approximating algorithms are proposed, including the Nystrom method - an approach with proven approximate error bounds. There are several algorithms that provide recipes to construct Nystrom approximations with variable accuracies and computing times. This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a heuristic on when to use it. Our heuristic depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test datasets compared to the other state-of-the-art methods