因果数据融合的强大直接学习

论文标题

因果数据融合的强大直接学习

Robust Direct Learning for Causal Data Fusion

论文作者

Li, Xinyu, Li, Yilin, Cui, Qing, Li, Longfei, Zhou, Jun

论文摘要

在大数据时代，多源异质数据的爆炸性增长为改善有条件平均治疗效果的推断提供了许多令人兴奋的挑战和机会。在本文中，我们研究了一般环境下的均质和异构因果数据融合问题，该问题允许存在源特异性协变量。我们提供了一个直接的学习框架，用于整合将治疗效果与其他滋扰功能分开的多源数据，并针对某些错误指定实现双重鲁棒性。为了提高估计的精度和稳定性，我们提出了一个因素效率理论的理论见解所激发的因果信息感知的加权函数；它为包含具有高可解释性的更多因果信息的样本分配了更大的权重。我们引入了两步算法，即加权的多源直接学习者，基于构建伪结果并在加权最小平方标准下在协变量上进行回归；它为我们提供了一个有力的因果数据融合的工具，享有轻松实现，双重鲁棒性和模型灵活性的优势。在仿真研究中，我们证明了我们提出的方法在均质和异质性数据融合情景中的有效性。

In the era of big data, the explosive growth of multi-source heterogeneous data offers many exciting challenges and opportunities for improving the inference of conditional average treatment effects. In this paper, we investigate homogeneous and heterogeneous causal data fusion problems under a general setting that allows for the presence of source-specific covariates. We provide a direct learning framework for integrating multi-source data that separates the treatment effect from other nuisance functions, and achieves double robustness against certain misspecification. To improve estimation precision and stability, we propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory; it assigns larger weights to samples containing more causal information with high interpretability. We introduce a two-step algorithm, the weighted multi-source direct learner, based on constructing a pseudo-outcome and regressing it on covariates under a weighted least square criterion; it offers us a powerful tool for causal data fusion, enjoying the advantages of easy implementation, double robustness and model flexibility. In simulation studies, we demonstrate the effectiveness of our proposed methods in both homogeneous and heterogeneous causal data fusion scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题