论文标题

基于图的过程挖掘

Graph-based process mining

论文作者

Jalali, Amin

论文摘要

流程挖掘是一个研究领域,该领域支持从其执行事件日志中发现有关业务流程的信息。组织中越来越多的事件日志挑战当前的流程挖掘技术,这些技术倾向于将数据加载到计算机的内存中。这个问题限制了组织大规模应用流程挖掘,并由于缺乏数据管理功能而引入风险。因此,本文介绍并正式化了一种新的方法,以将事件日志存储和从图形数据库中获取。它定义了一种算法来计算直接遵循图形数据库内的图(DFG),该图将过程挖掘的重量计算部分转移到图数据库中。计算图形数据库中的DFG可以利用图形数据库的水平和垂直缩放功能,以大规模应用过程挖掘。此外,它消除了将数据移至分析师计算机的要求。因此,它可以在图形数据库中使用数据管理功能。我们在neo4j中实现了这种方法,并与使用真实日志文件相比评估了其性能。结果表明,当数据比计算存储器大得多时,我们的方法可以计算DFG。将数据划分为小块时,它还显示出更好的性能。

Process mining is an area of research that supports discovering information about business processes from their execution event logs. The increasing amount of event logs in organizations challenges current process mining techniques, which tend to load data into the memory of a computer. This issue limits the organizations to apply process mining on a large scale and introduces risks due to the lack of data management capabilities. Therefore, this paper introduces and formalizes a new approach to store and retrieve event logs into/from graph databases. It defines an algorithm to compute Directly Follows Graph (DFG) inside the graph database, which shifts the heavy computation parts of process mining into the graph database. Calculating DFG in graph databases enables leveraging the graph databases' horizontal and vertical scaling capabilities in favor of applying process mining on a large scale. Besides, it removes the requirement to move data into analysts' computer. Thus, it enables using data management capabilities in graph databases. We implemented this approach in Neo4j and evaluated its performance compared with current techniques using a real log file. The result shows that our approach enables the calculation of DFG when the data is much bigger than the computational memory. It also shows better performance when dicing data into small chunks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源