如何分辨结果何时会复制：分布零假设检验中的显着性和复制

论文标题

如何分辨结果何时会复制：分布零假设检验中的显着性和复制

How to Tell When a Result Will Replicate: Significance and Replication in Distributional Null Hypothesis Tests

论文作者

Costello, Fintan, Watts, Paul

论文摘要

零假设显着性测试中存在一个众所周知的问题：许多统计学上显着的结果在随后的实验中无法复制。我们表明，由于标准的“点形式为null”的显着性测试仅考虑实验性但忽略实验性变化，因此系统地低估了结果的随机变化程度。我们通过分析体验内和经验之间的变化来扩展标准显着性测试，该标准显着性测试解决了这一问题。这种“分配无效”方法不会低估实验变异性，因此在识别重要性时并不过分自信。由于这种方法解决了体验之间的变化，因此它在数学上具有相干估计值，以复制重大结果。使用大规模复制数据集（第一个“许多实验室”项目），我们表明，当在这种方法中考虑到内部和体验之间的变化时，许多在标准测试中看起来具有统计学意义的实验结果实际上与随机变化一致。此外，将本数据集中的实验分组为“预测靶向”对，我们表明，该方法中产生的目标实验的预测复制概率（给定的预测实验结果和两个实验的样本量）与观察到的复制率密切相关。因此，分布零假设检验为研究人员提供了一种统计工具，用于识别具有统计学意义且可靠地复制结果的统计工具。

There is a well-known problem in Null Hypothesis Significance Testing: many statistically significant results fail to replicate in subsequent experiments. We show that this problem arises because standard `point-form null' significance tests consider only within-experiment but ignore between-experiment variation, and so systematically underestimate the degree of random variation in results. We give an extension to standard significance testing that addresses this problem by analysing both within- and between-experiment variation. This `distributional null' approach does not underestimate experimental variability and so is not overconfident in identifying significance; because this approach addresses between-experiment variation, it gives mathematically coherent estimates for the probability of replication of significant results. Using a large-scale replication dataset (the first `Many Labs' project), we show that many experimental results that appear statistically significant in standard tests are in fact consistent with random variation when both within- and between-experiment variation are taken into account in this approach. Further, grouping experiments in this dataset into `predictor-target' pairs we show that the predicted replication probabilities for target experiments produced in this approach (given predictor experiment results and the sample sizes of the two experiments) are strongly correlated with observed replication rates. Distributional null hypothesis testing thus gives researchers a statistical tool for identifying statistically significant and reliably replicable results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题