论文标题
使用多种环境检测观测数据中隐藏的混杂
Detecting hidden confounding in observational data using multiple environments
论文作者
论文摘要
观察数据的因果推断的一个常见假设是没有隐藏的混杂。然而,通常无法从单个数据集验证此假设。在假设数据生成过程的独立因果机制的假设下,我们演示了一种从不同环境传来多个观察数据集时检测未观察到的混杂因素的方法。我们提出了一种可检验的有条件独立性的理论,只有在隐藏的混淆并检查我们违反其假设的情况下才缺乏:退化和依赖机制以及违反忠诚的案件。此外,我们提出了一种测试这些独立性的程序,并使用基于现实世界数据集的模拟研究和半合成数据研究其经验有限样本行为。在大多数情况下,提出的程序正确预测了隐藏混杂的存在,尤其是当混杂偏见很大时。
A common assumption in causal inference from observational data is that there is no hidden confounding. Yet it is, in general, impossible to verify this assumption from a single dataset. Under the assumption of independent causal mechanisms underlying the data-generating process, we demonstrate a way to detect unobserved confounders when having multiple observational datasets coming from different environments. We present a theory for testable conditional independencies that are only absent when there is hidden confounding and examine cases where we violate its assumptions: degenerate & dependent mechanisms, and faithfulness violations. Additionally, we propose a procedure to test these independencies and study its empirical finite-sample behavior using simulation studies and semi-synthetic data based on a real-world dataset. In most cases, the proposed procedure correctly predicts the presence of hidden confounding, particularly when the confounding bias is large.