论文标题
分数归一化,特别是对于高度异构网络和文本数据
The SCORE normalization, especially for highly heterogeneous network and text data
论文作者
论文摘要
得分是作为网络社区检测的光谱方法引入的。由于许多网络具有严重的程度异质性,因此社区检测的普通光谱聚类(OSC)方法可能不令人满意。得分通过在光谱域引入新的归一化想法并使OSC更有效,可以减轻程度异质性的效果。得分易于使用,并且在计算上很快。它很容易适应新的方向,并看到对实践的兴趣越来越大。在本文中,我们回顾了分数的基础,分数对网络混合成员资格估计和主题建模的适应以及在实际数据中的分数应用,包括统计学家出版物上的两个数据集。我们还回顾了理论“意识形态”的基本分数。我们表明,在光谱域中,Score将简单的锥体转换为单纯形,并在单纯形和网络成员身份之间提供简单而直接的链接。分数达到了指数率和社区检测的急剧过渡,并在混合会员估计和主题建模中实现了最佳速率。
SCORE was introduced as a spectral approach to network community detection. Since many networks have severe degree heterogeneity, the ordinary spectral clustering (OSC) approach to community detection may perform unsatisfactorily. SCORE alleviates the effect of degree heterogeneity by introducing a new normalization idea in the spectral domain and makes OSC more effective. SCORE is easy to use and computationally fast. It adapts easily to new directions and sees an increasing interest in practice. In this paper, we review the basics of SCORE, the adaption of SCORE to network mixed membership estimation and topic modeling, and the application of SCORE in real data, including two datasets on the publications of statisticians. We also review the theoretical 'ideology' underlying SCORE. We show that in the spectral domain, SCORE converts a simplicial cone to a simplex, and provides a simple and direct link between the simplex and network memberships. SCORE attains an exponential rate and a sharp phase transition in community detection, and achieves optimal rates in mixed membership estimation and topic modeling.