使用串联重复的近似对齐计算进行有效的一致性检查

论文标题

使用串联重复的近似对齐计算进行有效的一致性检查

Efficient Conformance Checking using Approximate Alignment Computation with Tandem Repeats

论文作者

Reißner, Daniel, Armas-Cervantes, Abel, La Rosa, Marcello

论文摘要

一致性检查涵盖了一系列过程挖掘技术，旨在查找和描述捕获预期过程行为的过程模型与记录观察到的行为的相应事件日志之间的差异。对齐是一种已建立的技术，可以计算事件日志中的迹线与相应过程模型的最接近执行跟踪之间的距离。鉴于成本函数，当对数跟踪和模型跟踪之间包含最少数量的不匹配时，对齐是最佳的。但是，确定最佳一致性在计算上是昂贵的，尤其是鉴于实践的事件日志的规模和复杂性的增长，这很容易超过一百万个事件，并具有数百个活动的痕迹。现有对齐技术的一个共同局限性是无法利用日志中的重复。通过在迹线中利用特定形式的顺序模式，即串联重复，我们提出了一种新颖的近似技术，该技术使用预处理和后处理步骤来压缩痕量的长度并重新计算对齐成本，同时保证成本结果从未超过耐受的成本降低最佳成本。在具有50个现实生活模型对日志和六种最先进的对准技术的广泛经验评估中，我们表明，在有重复的痕迹的情况下，所提出的压缩方法系统地超过了基本线，并且在成本过高的情况下，当它出现时，当它出现时，就会出现。

Conformance checking encompasses a body of process mining techniques which aim to find and describe the differences between a process model capturing the expected process behavior and a corresponding event log recording the observed behavior. Alignments are an established technique to compute the distance between a trace in the event log and the closest execution trace of a corresponding process model. Given a cost function, an alignment is optimal when it contains the least number of mismatches between a log trace and a model trace. Determining optimal alignments, however, is computationally expensive, especially in light of the growing size and complexity of event logs from practice, which can easily exceed one million events with traces of several hundred activities. A common limitation of existing alignment techniques is the inability to exploit repetitions in the log. By exploiting a specific form of sequential pattern in traces, namely tandem repeats, we propose a novel approximate technique that uses pre- and post-processing steps to compress the length of a trace and recomputes the alignment cost while guaranteeing that the cost result never under-approximates the optimal cost. In an extensive empirical evaluation with 50 real-life model log pairs and against six state-of-the-art alignment techniques, we show that the proposed compression approach systematically outperforms the baselines by up to an order of magnitude in the presence of traces with repetitions, and that the cost over-approximation, when it occurs, is negligible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题