论文标题

AION:迟到总比没有事件时间流

Aion: Better Late than Never in Event-Time Streams

论文作者

Esteves, Sérgio, Morales, Gianmarco De Francisci, Rodrigues, Rodrigo, Serafini, Marco, Veiga, Luís

论文摘要

几乎实时处理数据流是一项越来越重要的任务。在事件键入数据的情况下,流处理系统必须立即处理处理相应窗口后到达的后期事件。为了启用此较晚的处理,必须长时间维护窗口状态。但是,当前系统将此状态保持在内存中,该状态可以施加最大的耐受性迟到,或者导致系统降低性能,甚至在系统内存耗尽时崩溃。 在本文中,我们提出了AION,这是一种以有效的方式处理后期事件的综合解决方案,该解决方案是在Flink之上实施的。在设计AION时,我们超越了一种天真的解决方案,该解决方案可以按需在内存和持续存储之间传输状态。特别是,我们引入了一个主动的缓存方案,我们利用流处理的语义来预测将数据带入内存的需求。此外,我们提出了一个预测性清理方案,以基于接受更多后期事件的可能性永久丢弃窗口状态,以防止储存消耗而没有界限。 我们的评估表明,AION能够维持可持续的记忆利用水平,同时仍然保持高吞吐量,低潜伏期和低稳定度。

Processing data streams in near real-time is an increasingly important task. In the case of event-timestamped data, the stream processing system must promptly handle late events that arrive after the corresponding window has been processed. To enable this late processing, the window state must be maintained for a long period of time. However, current systems maintain this state in memory, which either imposes a maximum period of tolerated lateness, or causes the system to degrade performance or even crash when the system memory runs out. In this paper, we propose AION, a comprehensive solution for handling late events in an efficient manner, implemented on top of Flink. In designing AION, we go beyond a naive solution that transfers state between memory and persistent storage on demand. In particular, we introduce a proactive caching scheme, where we leverage the semantics of stream processing to anticipate the need for bringing data to memory. Furthermore, we propose a predictive cleanup scheme to permanently discard window state based on the likelihood of receiving more late events, to prevent storage consumption from growing without bounds. Our evaluation shows that AION is capable of maintaining sustainable levels of memory utilization while still preserving high throughput, low latency, and low staleness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源