论文标题
一个数据流过程框架,用于Edge的自动驾驶
A Data Streaming Process Framework for Autonomous Driving By Edge
论文作者
论文摘要
近年来,随着传感技术和物联网(IoT)的快速发展,传感器在交通控制,医疗监测,工业生产等方面扮演着越来越重要的角色。它们以流量的方式产生了大量的数据,通常需要实时处理。因此,流数据计算技术在高吞吐量但较低延迟的传感器数据的实时处理中起着必不可少的作用。鉴于上述问题,提出的框架是在火花流的顶部实现的,该框架构建了基于灰色模型的流量流量监视器,流量预测导向的预测层和基于模糊控制的基于模糊控制的批量间隔动态调整层,以用于火花流。它可以预测传感器数据到达速率的变化,提前进行流批次间隔调整,并通过Edge实现实时流动过程。因此,它可以在边缘计算节点区域的地理覆盖范围内实现自动驾驶车辆传感器数据的数据流变化的监视器和预测,同时最小化端到端延迟,但满足了应用程序吞吐量的要求。该实验表明,它可以预测一整天相对误差不超过4%的短期流量。通过使批次消耗率接近数据生成速率,即使到达数据速率迅速变化,它也可以很好地保持系统稳定性。当数据到达率加倍时,可以在两分钟内将批处间隔收集到合适的值。与Vanilla版本Spark流相比,在存在严重的任务积累并引入大延迟的情况下,当数据到达率较低时,它可以通过挤压批处理间隔来降低35%的延迟;当数据到达率较高时,它还可以显着将系统吞吐量显着提高最多25%的批次间隔。
In recent years, with the rapid development of sensing technology and the Internet of Things (IoT), sensors play increasingly important roles in traffic control, medical monitoring, industrial production and etc. They generated high volume of data in a streaming way that often need to be processed in real time. Therefore, streaming data computing technology plays an indispensable role in the real-time processing of sensor data in high throughput but low latency. In view of the above problems, the proposed framework is implemented on top of Spark Streaming, which builds up a gray model based traffic flow monitor, a traffic prediction orientated prediction layer and a fuzzy control based Batch Interval dynamic adjustment layer for Spark Streaming. It could forecast the variation of sensors data arrive rate, make streaming Batch Interval adjustment in advance and implement real-time streaming process by edge. Therefore, it can realize the monitor and prediction of the data flow changes of the autonomous driving vehicle sensor data in geographical coverage of edge computing node area, meanwhile minimize the end-to-end latency but satisfy the application throughput requirements. The experiments show that it can predict short-term traffic with no more than 4% relative error in a whole day. By making batch consuming rate close to data generating rate, it can maintain system stability well even when arrival data rate changes rapidly. The Batch Interval can be converged to a suitable value in two minutes when data arrival rate is doubled. Compared with vanilla version Spark Streaming, where there has serious task accumulation and introduces large delay, it can reduce 35% latency by squeezing Batch Interval when data arrival rate is low; it also can significantly improve system throughput by only at most 25% Batch Interval increase when data arrival rate is high.