论文标题
学会期望出乎意料:大概是正确的域概括
Learn to Expect the Unexpected: Probably Approximately Correct Domain Generalization
论文作者
论文摘要
当培训数据和测试数据来自不同的数据域时,域的概括是机器学习的问题。我们提出了一个简单的理论模型,以概括跨数据分布的范围,这些模型对数据分布进行了元分布,这些数据分布甚至可能具有不同的支持。在我们的模型中,提供的学习算法的培训数据由多个数据集组成,每个数据集从元分布中绘制的单个域中每个数据集组成。我们在三个不同的问题设置中研究此模型 - 多域Massart噪声设置,决策树多数据集设置和特征选择设置,并发现每个计算上有效的,多项式样本域的通用是可能的。实验表明,我们的特征选择算法确实忽略了虚假的相关性并改善了概括。
Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm consists of multiple datasets each from a single domain drawn in turn from the meta-distribution. We study this model in three different problem settings---a multi-domain Massart noise setting, a decision tree multi-dataset setting, and a feature selection setting, and find that computationally efficient, polynomial-sample domain generalization is possible in each. Experiments demonstrate that our feature selection algorithm indeed ignores spurious correlations and improves generalization.