论文标题
序列数据的搁置实用程序挖掘
On-shelf Utility Mining of Sequence Data
论文作者
论文摘要
由于其广泛的应用和广泛的知名度,公用事业采矿已成为一个重要而有趣的话题。但是,传统的公用事业采矿方法对货架上时间更长的项目有偏见,因为它们有更大的机会产生高实用程序。为了消除偏见,引入了货架实用工具采矿(OSUM)的问题。在本文中,我们专注于序列数据的OSUM任务,其中顺序数据库根据时间段分为几个分区,项目与实用程序和几个架子上的时间段相关联。为了解决该问题,我们提出了两种方法:序列数据(OSUMS)和OSUMS+的OSUM,以提取架子上的高纯度顺序模式。为了提高效率,我们还设计了几种策略来减少搜索空间并避免使用两个上限时间前缀扩展实用程序(TPEU)和时间降低序列实用程序(TRSU)的冗余计算。此外,开发了两个新的数据结构,以促进上限和实用程序的计算。对某些实际和合成数据集的实验结果的实质性结果表明,这两种方法的表现都优于最新算法。总之,Osums可能会消耗大量内存,并且不适合有限的内存病例,而Osums+由于其高效率而具有更广泛的现实生活应用。
Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this paper, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS+, to extract on-shelf high-utility sequential patterns. For further efficiency, we also designed several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility (TPEU) and time reduced sequence utility (TRSU). In addition, two novel data structures were developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS+ has wider real-life applications owing to its high efficiency.