论文标题
通过分数向量对监督学习模型的概念漂移监测和诊断
Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors
论文作者
论文摘要
监督学习模型是最基本的模型类别之一。从概率的角度查看监督的学习,通常假定拟合模型的培训数据集遵循固定分布。但是,这种平稳性假设通常在称为概念漂移的现象中违反,该现象是指随着时间的流逝而变化,在协变量$ \ mathbf {x} $和响应变量$ y $之间的预测关系中,并且可以渲染受过训练的模型次优或过时。我们为检测,监视和诊断概念漂移而开发了一个全面且在计算上有效的框架。具体而言,我们使用多变量指数加权移动平均值的形式来监视拟合模型的对数似然梯度的梯度,该形式可以监视随机向量平均值的一般变化。尽管我们在基于流行的错误方法上证明了具有实质性的性能优势,但以前尚未考虑基于分数的方法进行概念漂移监测。提出的基于分数的框架的优点包括适用于任何参数模型,对理论和实验中所示的更大变化的检测以及固有的诊断功能,以帮助识别变化的性质。
Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for helping to identify the nature of the changes.