论文标题
摘要的数据引擎用于交互式极限分析
A Synopses Data Engine for Interactive Extreme-Scale Analytics
论文作者
论文摘要
在这项工作中,我们详细介绍了概要数据引擎(SDE)的设计和结构,该设计与并行处理和流摘要相结合的优点旨在以极大的规模提供交互式分析。我们的SDE建立在Apache Flink的顶部,并实现了一个简介 - AS-A-Service范式。因为它可以实现(a)同时维持数千种各种类型的概述,以供数千种按需流动,(b)在各种并发的工作流程中重复使用维持概念,(c)为跨(大数据)平台工作流程提供数据汇总设施,即平台工作流程,(d)(d)在the-fly,Fly,Fly,(e)上实现的新概述的功能提高了(e)的实现,(e)实现了工作流程的实现。所提出的SDE对于在极端的交互式分析方面很有用,因为它可以增强水平可伸缩性,即不仅将计算缩放到计算机群集中可用的许多处理单元,还可以利用通过精心制作的数据总结,(ii)垂直稳定性,(ii)计算的数量(II),(ii)计算的数量,(II),(ii)计算的数量,(II),(ii)计算的数量,(II),则(II)计算的数量(II)(II联合的可伸缩性,即,通过控制在许多潜在的地理分散群集上提出的全局查询所需的通信,从而将计算扩展到了单个集群和云之外。
In this work, we detail the design and structure of a Synopses Data Engine (SDE) which combines the virtues of parallel processing and stream summarization towards delivering interactive analytics at extreme scale. Our SDE is built on top of Apache Flink and implements a synopsis-as-a-service paradigm. In that it achieves (a) concurrently maintaining thousands of synopses of various types for thousands of streams on demand, (b) reusing maintained synopses among various concurrent workflows, (c) providing data summarization facilities even for cross-(Big Data) platform workflows, (d) pluggability of new synopses on-the-fly, (e) increased potential for workflow execution optimization. The proposed SDE is useful for interactive analytics at extreme scales because it enables (i) enhanced horizontal scalability, i.e., not only scaling out the computation to a number of processing units available in a computer cluster, but also harnessing the processing load assigned to each by operating on carefully-crafted data summaries, (ii) vertical scalability, i.e., scaling the computation to very high numbers of processed streams and (iii) federated scalability i.e., scaling the computation beyond single clusters and clouds by controlling the communication required to answer global queries posed over a number of potentially geo-dispersed clusters.