论文标题

张量块模型中的精确聚类:统计最佳和计算限制

Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit

论文作者

Han, Rungang, Luo, Yuetian, Wang, Miaoyan, Zhang, Anru R.

论文摘要

高阶聚类旨在识别多路数据集中通常在神经成像,基因组学,社交网络研究等中出现的异质子结构。该问题的非凸面和不连续性在统计和计算中都构成了重大挑战。在本文中,我们提出了一个张量模型和计算有效的方法,即\ emph {高阶lloyd算法}(Hlloyd)和高阶光谱群集(HSC),用于高阶聚类。在温和的下噪声假设下,为拟议程序建立了收敛的保证和统计最佳性。在高斯张量模型下,我们完全表征了基于三种不同的信噪比制度实现高阶精确聚类的统计计算权衡。该分析依赖于高阶光谱扰动分析的新技术和张量估计中结合的``单数值无差距''误差,这与文献中的矩阵光谱分析大不相同。最后,我们通过对合成数据集和真实数据集进行了广泛的实验来展示所提出的程序的优点。

High-order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non-convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd), and high-order spectral clustering (HSC), for high-order clustering. The convergence guarantees and statistical optimality are established for the proposed procedure under a mild sub-Gaussian noise assumption. Under the Gaussian tensor block model, we completely characterize the statistical-computational trade-off for achieving high-order exact clustering based on three different signal-to-noise ratio regimes. The analysis relies on new techniques of high-order spectral perturbation analysis and a ``singular-value-gap-free'' error bound in tensor estimation, which are substantially different from the matrix spectral analyses in the literature. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源