通过投注进行互动等级测试

论文标题

通过投注进行互动等级测试

Interactive rank testing by betting

论文作者

Duan, Boyan, Ramdas, Aaditya, Wasserman, Larry

论文摘要

为了在与协变量的随机实验中测试治疗是否明显不同于安慰剂，已经采用了基于观测/残差等级的经典非参数测试（例如：由Rosenbaum），并通过置换启用有限样本的有效样本推理。本文提出了一个可以基于推断的不同原则：如果 - 访问所有协变量和结果，但无需访问任何治疗任务，就可以形成对受试者的排名，而这些受试者的排名（例如：主要由控制大多数受到控制），那么我们可以自信地结论治疗效果。基于该原理的更细微，更量化的版本，我们设计了一个称为I-BET的交互式测试：分析师一次构成一个主题的单个置换，并且在每个步骤中，分析师在每个步骤中都将Toy Toy Money to toy Money to toe toy to to toe to narke to nikent thit top nake and Incore，并在此后立即学习真相。财富过程构成了针对全球因果零的实现证据的衡量标准，如果财富越过$ 1/α$，我们可能会拒绝以$α$级的零。除了提供基于因果关系结论的新的“游戏理论”原则外，I-BET还具有其他统计和计算益处，例如（a）允许人类可以根据越来越多的数据揭示的数据来适应测试统计量（以及任何工作的因果关系模型和先验知识），以及不需要置于置换的人，而不是在置于置换术的范围，而是（b）不置于置换的范围，而不是置于MART的范围。上述决策规则的1型错误控制遵循维尔的严重不平等。此外，如果未拒绝零，则可以稍后添加新主题，并且可以简单地进行测试，而无需任何校正（与排列p值不同）。

In order to test if a treatment is perceptibly different from a placebo in a randomized experiment with covariates, classical nonparametric tests based on ranks of observations/residuals have been employed (eg: by Rosenbaum), with finite-sample valid inference enabled via permutations. This paper proposes a different principle on which to base inference: if -- with access to all covariates and outcomes, but without access to any treatment assignments -- one can form a ranking of the subjects that is sufficiently nonrandom (eg: mostly treated followed by mostly control), then we can confidently conclude that there must be a treatment effect. Based on a more nuanced, quantifiable, version of this principle, we design an interactive test called i-bet: the analyst forms a single permutation of the subjects one element at a time, and at each step the analyst bets toy money on whether that subject was actually treated or not, and learns the truth immediately after. The wealth process forms a real-valued measure of evidence against the global causal null, and we may reject the null at level $α$ if the wealth ever crosses $1/α$. Apart from providing a fresh "game-theoretic" principle on which to base the causal conclusion, the i-bet has other statistical and computational benefits, for example (A) allowing a human to adaptively design the test statistic based on increasing amounts of data being revealed (along with any working causal models and prior knowledge), and (B) not requiring permutation resampling, instead noting that under the null, the wealth forms a nonnegative martingale, and the type-1 error control of the aforementioned decision rule follows from a tight inequality by Ville. Further, if the null is not rejected, new subjects can later be added and the test can be simply continued, without any corrections (unlike with permutation p-values).

下载PDF全文

下载文献需遵守相关版权规定

论文标题