论文标题
在流图上进行空间有效的随机步行
Space-Efficient Random Walks on Streaming Graphs
论文作者
论文摘要
许多应用程序中的图形(例如社交网络和物联网)本质上是流式传输,涉及以高速率的持续添加以及顶点和边缘的删除。在图形中构建随机步行,即选择具有特定概率分布的顶点的序列,是许多这些图形应用程序以及机器学习(ML)在图形结构数据上的重要任务。在流媒体场景中,随机步行需要不断跟上图形更新,以避免陈旧步行,从而在下游任务中进行性能退化。我们提出码头,该系统有效地存储和更新流媒体图上的随机步行。它通过维护压缩,高通量和低延迟数据结构来避免潜在的大小爆炸。它通过将压缩的纯粹功能性二进制树和配对功能耦合来存储步行,以及(ii)通过有效修剪步行搜索空间来实现(i)简洁的表示。在更新随机步行时,我们用真实和合成图评估码头,并通过吞吐量和延迟来评估码头。结果表明,码头高于倒立索引和基于树的基线的优势。
Graphs in many applications, such as social networks and IoT, are inherently streaming, involving continuous additions and deletions of vertices and edges at high rates. Constructing random walks in a graph, i.e., sequences of vertices selected with a specific probability distribution, is a prominent task in many of these graph applications as well as machine learning (ML) on graph-structured data. In a streaming scenario, random walks need to constantly keep up with the graph updates to avoid stale walks and thus, performance degradation in the downstream tasks. We present Wharf, a system that efficiently stores and updates random walks on streaming graphs. It avoids a potential size explosion by maintaining a compressed, high-throughput, and low-latency data structure. It achieves (i) the succinct representation by coupling compressed purely functional binary trees and pairing functions for storing the walks, and (ii) efficient walk updates by effectively pruning the walk search space. We evaluate Wharf, with real and synthetic graphs, in terms of throughput and latency when updating random walks. The results show the high superiority of Wharf over inverted index- and tree-based baselines.