论文标题

用于飞行数据中异常检测的增量聚类方法

An Incremental Clustering Method for Anomaly Detection in Flight Data

论文作者

Zhao, Weizun, Li, Lishuai, Alam, Sameer, Wang, Yanjun

论文摘要

安全是民航的重中之重。已经开发了新的异常检测方法,主要是聚类方法,以监视飞行员操作并检测此类飞行数据中的任何风险。但是,所有现有的异常检测方法均为offlline学习 - 使用历史数据对模型进行了培训,并用于所有未来的预测。实际上,新的飞行数据会连续积累,并在航空公司每月进行分析。聚集这种动态增长的数据对于Offlline方法而言是一项挑战,因为每次新数据都没有重新培训模型是内存和时间大量的。如果模型未经训练,则由于模型无法反映数据模式的变化,因此错误的警报或丢失的检测可能会增加。为了解决这个问题,我们提出了一种基于高斯混合模型(GMM)的新型增量异常检测方法,以识别常见模式并从数字飞行数据中检测飞行操作中的异常值。这是一个概率的飞行操作聚类模型,可以根据新数据逐步更新其群集,而不是从头开始重新群集所有数据。它基于历史offlline数据来训练初始的GMM模型。然后,它通过预期最大化(EM)算法不断适应新的传入数据点。为了跟踪飞行操作模式的变化,仅需要保存模型参数。对所提出的方法进行了三组仿真数据和两组现实世界飞行数据的测试。与传统的离线GMM方法相比,所提出的方法可以产生类似的聚类结果,并显着减少处理时间(测试集的时间降低57%-99%)和内存使用情况(91%-91%-95%的测试集中记忆使用量减少)。初步结果表明,增量学习方案在处理飞行数据分析中动态增长的数据有效。

Safety is a top priority for civil aviation. New anomaly detection methods, primarily clustering methods, have been developed to monitor pilot operations and detect any risks from such flight data. However, all existing anomaly detection methods are offlline learning - the models are trained once using historical data and used for all future predictions. In practice, new flight data are accumulated continuously and analyzed every month at airlines. Clustering such dynamically growing data is challenging for an offlline method because it is memory and time intensive to re-train the model every time new data come in. If the model is not re-trained, false alarms or missed detections may increase since the model cannot reflect changes in data patterns. To address this problem, we propose a novel incremental anomaly detection method based on Gaussian Mixture Model (GMM) to identify common patterns and detect outliers in flight operations from digital flight data. It is a probabilistic clustering model of flight operations that can incrementally update its clusters based on new data rather than to re-cluster all data from scratch. It trains an initial GMM model based on historical offlline data. Then, it continuously adapts to new incoming data points via an expectation-maximization (EM) algorithm. To track changes in flight operation patterns, only model parameters need to be saved. The proposed method was tested on three sets of simulation data and two sets of real-world flight data. Compared with the traditional offline GMM method, the proposed method can generate similar clustering results with significantly reduced processing time (57 % - 99 % time reduction in testing sets) and memory usage (91 % - 95 % memory usage reduction in testing sets). Preliminary results indicate that the incremental learning scheme is effective in dealing with dynamically growing data in flight data analytics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源