论文标题
GraphTango:用于高效流式图形更新和分析的混合表示格式
GraphTango: A Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis
论文作者
论文摘要
流图处理涉及在时间不断发展的图表上执行更新和分析。基本表示格式在很大程度上决定了这些更新和分析阶段的吞吐量。现有格式通常采用哈希表或邻接列表的变体。但是,基于邻接列表的方法在重尾图上的性能较差,基于哈希的方法在短尾图上遭受了影响。我们提出了GraphTango,这是一种混合格式,无论该图的分布如何,都提供出色的更新和分析吞吐量。 GraphTango基于顶点度的三种不同格式之间的开关:i)低度顶点将边缘直接存储在邻里元数据中,限制对单个缓存线的访问,ii)中等程度的顶点使用邻接列表,以及III)使用HASH TABLECS以及邻接列表。在这种情况下,邻接列表在分析阶段提供了快速的遍历,而哈希表则在更新阶段提供了恒定的时间查找。我们通过设计一个基于开放的哈希表,进一步优化了性能,该表完全利用了所有获取的高速缓存线。此外,我们开发了一个无螺纹锁定的内存池,该池允许在多线程环境中快速生长/缩小邻接列表和哈希表。我们在SAGA基础框架的帮助下评估了GraphTango,并将其与其他四种表示格式进行了比较。平均而言,GraphTango提供4.5倍的插入吞吐量,删除吞吐量更高3.2倍,而在下一个最佳格式上的分析吞吐量高1.1倍。此外,我们将GraphTango与最新的图形处理框架Dzig和Risgraph集成在一起。与Vanilla Dzig和Vanilla Risgraph,[GraphTango + Dzig]和[GraphTango + Risgraph]相比,平均批处理时间分别减少了2.3倍和1.5倍。
Streaming graph processing involves performing updates and analytics on a time-evolving graph. The underlying representation format largely determines the throughputs of these updates and analytics phases. Existing formats usually employ variations of hash tables or adjacency lists. However, adjacency-list-based approaches perform poorly on heavy-tailed graphs, and the hash-based approaches suffer on short-tailed graphs. We propose GraphTango, a hybrid format that provides excellent update and analytics throughput regardless of the graph's degree distribution. GraphTango switches among three different formats based on a vertex's degree: i) Low-degree vertices store the edges directly with the neighborhood metadata, confining accesses to a single cache line, ii) Medium-degree vertices use adjacency lists, and iii) High-degree vertices use hash tables as well as adjacency lists. In this case, adjacency list provides fast traversal during the analytics phase, while the hash table provides constant-time lookups during the update phase. We further optimized the performance by designing an open-addressing-based hash table that fully utilizes every fetched cache line. In addition, we developed a thread-local lock-free memory pool that allows fast growing/shrinking of the adjacency lists and hash tables in a multi-threaded environment. We evaluated GraphTango with the help of the SAGA-Bench framework and compared it with four other representation formats. On average, GraphTango provides 4.5x higher insertion throughput, 3.2x higher deletion throughput, and 1.1x higher analytics throughput over the next best format. Furthermore, we integrated GraphTango with the state-of-the-art graph processing frameworks DZiG and RisGraph. Compared to the vanilla DZiG and vanilla RisGraph, [GraphTango + DZiG] and [GraphTango + RisGraph] reduces the average batch processing time by 2.3x and 1.5x, respectively.