对协变量和目标变化的正则最小二乘的重要性加权校正

论文标题

对协变量和目标变化的正则最小二乘的重要性加权校正

Importance Weighting Correction of Regularized Least-Squares for Covariate and Target Shifts

论文作者

Gogolashvili, Davit

论文摘要

在许多现实世界中，培训数据和测试数据具有不同的分布。这种情况通常称为数据集偏移。文献中经常考虑的数据集偏移的最常见设置是{\ em协变量}和{\ em target shift}。重要性加权（IW）校正是一种通用方法，用于纠正数据集偏移下的学习方案中存在的偏差。人们可能会问的问题是：对于不同的数据集偏移方案，IW校正是否同样好？通过研究协变量和目标移位下加权内核脊回归（W-KRR）的概括特性，我们表明答案是负面的，除非IW有界限并且模型被很好地指定。在后一种情况下，通过重要性加权内核脊回归（IW-KRR），协变量和目标变化方案来实现最小值的最佳速率。稍微放松IW的界限条件，我们表明IW-KRR仍然达到目标变化下的最佳速率，同时导致协变量偏移速度较慢。在模型错误指定的情况下，我们表明，通过设计替代的重新加权函数，可以大大提高W-KRR在协变量转移下的性能。在目标转移下的学习问题中，刻不清的方案和明确指定的方案之间的区别似乎并不重要。

In many real world problems, the training data and test data have different distributions. This situation is commonly referred as a dataset shift. The most common settings for dataset shift often considered in the literature are {\em covariate shift } and {\em target shift}. Importance weighting (IW) correction is a universal method for correcting the bias present in learning scenarios under dataset shift. The question one may ask is: does IW correction work equally well for different dataset shift scenarios? By investigating the generalization properties of the weighted kernel ridge regression (W-KRR) under covariate and target shifts we show that the answer is negative, except when IW is bounded and the model is wellspecified. In the latter cases, a minimax optimal rates are achieved by importance weighted kernel ridge regression (IW-KRR) in both, covariate and target shift scenarios. Slightly relaxing the boundedness condition of the IW we show that the IW-KRR still achieves the optimal rates under target shift while leading to slower rates for covariate shift. In the case of the model misspecification we show that the performance of the W-KRR under covariate shift could be substantially increased by designing an alternative reweighting function. The distinction between misspecified and wellspecified scenarios does not seem to be crucial in the learning problems under target shift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题