论文标题
通过统计学习理论确保学习保证概念漂移检测
Ensuring Learning Guarantees on Concept Drift Detection with Statistical Learning Theory
论文作者
论文摘要
概念漂移(CD)检测旨在连续识别数据流行为的变化,从而支持研究人员的研究和现实现象的建模。由于当前CD算法缺乏学习保证的动机,我们决定利用统计学习理论(SLT)正式化了必要的要求,以确保概率学习界限,因此漂移将指数据的实际变化,而不是偶然。如本文所讨论的那样,必须为了依靠SLT界限,必须进行一组数学假设,在CD方案中尤其引起争议。基于此问题,我们提出了一种方法,以解决CD场景中的这些假设,从而确保学习保证。互补的,我们根据我们的方法论评估了文献中的一组相关和已知的CD算法。作为主要贡献,我们希望这项工作能够在设计和评估不同领域的CD算法时支持研究人员。
Concept Drift (CD) detection intends to continuously identify changes in data stream behaviors, supporting researchers in the study and modeling of real-world phenomena. Motivated by the lack of learning guarantees in current CD algorithms, we decided to take advantage of the Statistical Learning Theory (SLT) to formalize the necessary requirements to ensure probabilistic learning bounds, so drifts would refer to actual changes in data rather than by chance. As discussed along this paper, a set of mathematical assumptions must be held in order to rely on SLT bounds, which are especially controversial in CD scenarios. Based on this issue, we propose a methodology to address those assumptions in CD scenarios and therefore ensure learning guarantees. Complementary, we assessed a set of relevant and known CD algorithms from the literature in light of our methodology. As main contribution, we expect this work to support researchers while designing and evaluating CD algorithms on different domains.