论文标题
HUSP-SP:序列数据上更快的公用事业开采
HUSP-SP: Faster Utility Mining on Sequence Data
论文作者
论文摘要
高纯粹的顺序模式采矿(HUSPM)由于其广泛的应用和广泛的知名度而成为一个重要主题。但是,由于HUSPM问题遇到低实用程序阈值或大规模数据时,由于搜索空间的组合爆炸,因此可以耗时且记忆力耗费以解决HUSPM问题。已经提出了几种算法来解决此问题,但是在运行时间和内存使用方面,它们仍然花费很多。在本文中,为了进一步解决这个问题,我们设计了一种称为序列投影(SEQPRO)的紧凑结构,并提出了有效的算法,即使用SEQPRO结构(HUSP-SP)发现了高意外序列模式。 HUSP-SP利用紧凑型SEQ阵列将必要的信息存储在序列数据库中。 SEQPRO结构旨在有效计算候选模式的实用程序和上限值。此外,利用了一种新的对效用的上限,即更严格的序列实用程序(TRSU)和搜索空间中的两种修剪策略,用于改善HUSP-SP的采矿性能。合成和现实生活数据集的实验结果表明,在运行时间,内存使用情况,搜索空间修剪效率和可扩展性方面,HUSP-SP可以显着优于最先进的算法。
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.