学会期望出乎意料：大概是正确的域概括

论文标题

学会期望出乎意料：大概是正确的域概括

Learn to Expect the Unexpected: Probably Approximately Correct Domain Generalization

论文作者

Garg, Vikas K., Kalai, Adam, Ligett, Katrina, Wu, Zhiwei Steven

论文摘要

当培训数据和测试数据来自不同的数据域时，域的概括是机器学习的问题。我们提出了一个简单的理论模型，以概括跨数据分布的范围，这些模型对数据分布进行了元分布，这些数据分布甚至可能具有不同的支持。在我们的模型中，提供的学习算法的培训数据由多个数据集组成，每个数据集从元分布中绘制的单个域中每个数据集组成。我们在三个不同的问题设置中研究此模型 - 多域Massart噪声设置，决策树多数据集设置和特征选择设置，并发现每个计算上有效的，多项式样本域的通用是可能的。实验表明，我们的特征选择算法确实忽略了虚假的相关性并改善了概括。

Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm consists of multiple datasets each from a single domain drawn in turn from the meta-distribution. We study this model in three different problem settings---a multi-domain Massart noise setting, a decision tree multi-dataset setting, and a feature selection setting, and find that computationally efficient, polynomial-sample domain generalization is possible in each. Experiments demonstrate that our feature selection algorithm indeed ignores spurious correlations and improves generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题