论文标题
通过利用种子变量的稳定预测
Stable Prediction via Leveraging Seed Variable
论文作者
论文摘要
在本文中,我们关注跨未知测试数据稳定预测的问题,在未知的测试数据中,测试分布不可知,并且可能与培训稳定分布完全不同。在这种情况下,以前的机器学习方法可能会利用由非毒物变量引起的预测引起的训练数据中的微妙相关性。这些虚假的相关性在数据之间是可以改变的,从而导致跨数据的预测不稳定。通过假设因果变量与响应变量之间的关系在跨数据之间是不变的,为了解决此问题,我们提出了一种有条件的基于独立测试的算法,以将这些因果变量作为先验变量分开,并采用它们来稳定预测。通过假设因果变量和非因果变量之间的独立性,我们在理论上和经验实验中都表明,我们的算法可以精确地分离出跨测试数据的稳定预测的因果和非临床变量。对合成和现实世界数据集的广泛实验表明,我们的算法优于稳定预测的最先进方法。
In this paper, we focus on the problem of stable prediction across unknown test data, where the test distribution is agnostic and might be totally different from the training one. In such a case, previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction. Those spurious correlations are changeable across data, leading to instability of prediction across data. By assuming the relationships between causal variables and response variable are invariant across data, to address this problem, we propose a conditional independence test based algorithm to separate those causal variables with a seed variable as priori, and adopt them for stable prediction. By assuming the independence between causal and non-causal variables, we show, both theoretically and with empirical experiments, that our algorithm can precisely separate causal and non-causal variables for stable prediction across test data. Extensive experiments on both synthetic and real-world datasets demonstrate that our algorithm outperforms state-of-the-art methods for stable prediction.