论文标题
带有多个截止的回归不连续设计下的安全政策学习
Safe Policy Learning under Regression Discontinuity Designs with Multiple Cutoffs
论文作者
论文摘要
回归不连续性(RD)设计被广泛用于使用观察数据的程序评估。现有文献的主要重点是估计现有治疗临界值时局部平均治疗效果。相比之下,我们考虑了RD设计下的政策学习。由于治疗分配机制是确定性的,因此学习更好的治疗截止需要外推。我们开发了一种强大的优化方法来找到最佳的治疗截止方法,以改善现有治疗方法。我们首先将预期的实用程序分解为可识别和无法识别的组件。然后,我们为可识别零件提出了有效的双重稳定估计器。为了说明无法识别的组件,我们利用RD设计下常见的多个截止值的存在。具体而言,我们假设不同组对潜在结果的条件期望的异质性沿运行变量平稳。在此假设下,我们将相对于现状策略的最坏情况损失最小化。由此产生的新治疗截止值可以安全保证,即它们不会比现有的临界值获得更糟糕的总体结果。最后,我们使用半参数效率理论为学习政策建立了渐近的遗憾界限。我们将提出的方法应用于经验和模拟数据集。
The regression discontinuity (RD) design is widely used for program evaluation with observational data. The primary focus of the existing literature has been the estimation of the local average treatment effect at the existing treatment cutoff. In contrast, we consider policy learning under the RD design. Because the treatment assignment mechanism is deterministic, learning better treatment cutoffs requires extrapolation. We develop a robust optimization approach to finding optimal treatment cutoffs that improve upon the existing ones. We first decompose the expected utility into point-identifiable and unidentifiable components. We then propose an efficient doubly-robust estimator for the identifiable parts. To account for the unidentifiable components, we leverage the existence of multiple cutoffs that are common under the RD design. Specifically, we assume that the heterogeneity in the conditional expectations of potential outcomes across different groups vary smoothly along the running variable. Under this assumption, we minimize the worst case utility loss relative to the status quo policy. The resulting new treatment cutoffs have a safety guarantee that they will not yield a worse overall outcome than the existing cutoffs. Finally, we establish the asymptotic regret bounds for the learned policy using semi-parametric efficiency theory. We apply the proposed methodology to empirical and simulated data sets.