论文标题
通过节点分组的图摘要:光谱算法
Graph Summarization via Node Grouping: A Spectral Algorithm
论文作者
论文摘要
通过节点分组的图形摘要是一种流行的方法,可以通过将原始图形的节点分组为超节点并将边缘编码为超中心,从而使邻接信息的丢失最小化。由于其尺寸较小和高查询处理效率,因此此类摘要在大规模图分析中具有巨大的应用。在本文中,我们将摘要的损失最小化问题重新制定为等效整数最大化问题。通过最初允许放松的(分数)溶液以进行整数最大化,我们可以分析地公开与邻接矩阵的光谱特性的基本连接。因此,我们设计了一种称为Specsumm的算法,该算法由两个阶段组成。在第一阶段,是由光谱图理论激励的,我们将k均值聚类应用于邻接矩阵的最大(幅度)特征向量,以将节点分配给超节点。在第二阶段,我们提出了一种贪婪的启发式方法,该启发式方法可以更新初始作业,以进一步提高摘要质量。最后,通过在11个数据集上进行的大量实验,我们表明Specsumm与最先进的摘要算法和量表与具有数百万个节点的图形相比,有效地产生了高质量的摘要。
Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.