合成主组件设计：与合成控制的快速协变量平衡

论文标题

合成主组件设计：与合成控制的快速协变量平衡

Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls

论文作者

Lu, Yiping, Li, Jiajin, Ying, Lexing, Blanchet, Jose

论文摘要

实验的最佳设计通常涉及解决NP-HARD组合优化问题。在本文中，我们旨在开发一种全球收敛性和实际有效的优化算法。具体而言，我们考虑了可用预处理结果数据并调用合成控制估计器的设置。平均治疗效果是通过从观察到的数据中学到的权重的处理单元的加权平均结果和对照单元的加权平均结果之间估计的。 {在这种情况下，我们出人意料地观察到，最佳实验设计问题可以简化为所谓的\ textit {phase Synchronization}问题。}我们通过具有光谱初始化的广义功率方法的归一化变体解决了此问题。从理论方面来说，当从某些数据生成过程中采样预处理数据时，我们建立了实验设计的第一个全局最佳保证。从经验上讲，我们进行了广泛的实验，以证明我们方法对美国劳工统计局和Abadie-Diemond-Hainmueller加利福尼亚吸烟数据的有效性。就均方根误差而言，我们的算法超过了一个随机设计。

The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题