论文标题
达到最佳的双重稳定估计异质因果效应
Towards optimal doubly robust estimation of heterogeneous causal effects
论文作者
论文摘要
异质效应估计在因果推论中起着至关重要的作用,在医学和社会科学上的应用。近年来已经提出了许多用于估计条件平均治疗效果(CATE)的方法,但是在理解该方法是否以及何时最佳的理论差距中存在重要的理论差距。当Cate具有非平凡的结构(例如平滑度或稀疏性)时,尤其如此。我们的工作以几种主要方式做出了贡献。首先,我们研究了一个两阶段的双重稳健的CATE估计器,并给出了一个通用的无模型误差绑定,尽管它具有一般性,但它的结果比当前文献中的结果更加清晰。我们将界限应用于具有平稳性或稀疏性的非参数模型中的错误率,并为Oracle效率提供足够的条件。基本的错误界限是带有估计或估计结果的回归的一般甲骨文不平等,这是独立的利益;这是第二个主要贡献。第三个贡献旨在了解CATE估计的基本统计限制。为此,我们提出并研究了双层回归的局部多项式适应。我们表明,如果在较弱的条件下,该估计器可以在较弱的条件下有效地有效,如果与特殊形式的样本分裂形式使用,并且对调谐参数进行了仔细的选择。这些是文献中目前发现的最弱的条件,我们猜想它们在最小的意义上是最小的。我们继续在无法实现甲骨文率的非平凡制度中给出错误界限。通过模拟探索了一些有限样本的特性。
Heterogeneous effect estimation plays a crucial role in causal inference, with applications across medicine and social science. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there are important theoretical gaps in understanding if and when such methods are optimal. This is especially true when the CATE has nontrivial structure (e.g., smoothness or sparsity). Our work contributes in several main ways. First, we study a two-stage doubly robust CATE estimator and give a generic model-free error bound, which, despite its generality, yields sharper results than those in the current literature. We apply the bound to derive error rates in nonparametric models with smoothness or sparsity, and give sufficient conditions for oracle efficiency. Underlying our error bound is a general oracle inequality for regression with estimated or imputed outcomes, which is of independent interest; this is the second main contribution. The third contribution is aimed at understanding the fundamental statistical limits of CATE estimation. To that end, we propose and study a local polynomial adaptation of double-residual regression. We show that this estimator can be oracle efficient under even weaker conditions, if used with a specialized form of sample splitting and careful choices of tuning parameters. These are the weakest conditions currently found in the literature, and we conjecture that they are minimal in a minimax sense. We go on to give error bounds in the non-trivial regime where oracle rates cannot be achieved. Some finite-sample properties are explored with simulations.