论文标题
矩阵轮廓XXVII:比较长时间序列的新型距离度量
Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series
论文作者
论文摘要
最有用的数据挖掘原始原始是距离测量。通过有效的距离度量,对于单活动时间序列欧几里得距离和动态时间扭曲距离,可以执行分类,聚类,异常检测,分割等。已知非常有效。但是,对于包含周期性行为的时间序列,这种比较的语义意义不太清楚。例如,在两天的两天中,运动员锻炼的遥测可能非常相似。第二天可能会改变执行俯卧撑和下蹲的顺序,增加上拉的重复或完全省略哑铃卷发。这些较小的更改中的任何一个都将打败现有的时间序列距离度量。已经提出了一些功能袋方法来解决这个问题,但我们认为,在许多情况下,相似性与这些较长时间序列中的子序列的形状密切相关。在这种情况下,总结特征将缺乏歧视能力。在这项工作中,我们介绍了Prcis,该公司代表串联的模式表示比较。 Prcis是长期序列的距离度量,它利用了我们用字典总结时间序列的最新进展。我们将展示我们对各种任务和数据集的想法的实用性。
The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete workout routine might be very similar. The second day may change the order in of performing push-ups and squats, adding repetitions of pull-ups, or completely omitting dumbbell curls. Any of these minor changes would defeat existing time series distance measures. Some bag-of-features methods have been proposed to address this problem, but we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with dictionaries. We will demonstrate the utility of our ideas on diverse tasks and datasets.