论文标题
时间序列中事件影响的两样本测试
Two-Sample Testing for Event Impacts in Time Series
论文作者
论文摘要
在许多应用领域中,对时间序列进行监测以检测极端事件,例如技术缺陷,自然灾害或疾病暴发。不幸的是,选择一个有关事件的信息序列和强大的检测算法的时间序列通常是不平凡的:检测可能会失败,因为检测算法不合适,或者因为时间序列与感兴趣的事件之间没有共享信息。因此,在这项工作中,我们提出了一个非参数统计检验,以在时间序列和一系列观察到的事件之间共享信息。我们的测试允许识别有关事件发生的信息的时间序列,而无需进行特定的事件检测方法。简而言之,我们通过多个两样本测试方法在事件发生后增加时间序列的价值分布的差异。与相关测试相反,我们的方法适用于任意域(包括多元数字,字符串或图形)的时间序列。我们进行了一项大规模的仿真研究,以表明它的表现胜过或与我们的单变量时间序列的相关测试相当。我们还展示了我们的方法在社交媒体和智能家庭环境中的数据集上的现实适用性。
In many application domains, time series are monitored to detect extreme events like technical faults, natural disasters, or disease outbreaks. Unfortunately, it is often non-trivial to select both a time series that is informative about events and a powerful detection algorithm: detection may fail because the detection algorithm is not suitable, or because there is no shared information between the time series and the events of interest. In this work, we thus propose a non-parametric statistical test for shared information between a time series and a series of observed events. Our test allows identifying time series that carry information on event occurrences without committing to a specific event detection methodology. In a nutshell, we test for divergences of the value distributions of the time series at increasing lags after event occurrences with a multiple two-sample testing approach. In contrast to related tests, our approach is applicable for time series over arbitrary domains, including multivariate numeric, strings or graphs. We perform a large-scale simulation study to show that it outperforms or is on par with related tests on our task for univariate time series. We also demonstrate the real-world applicability of our approach on datasets from social media and smart home environments.