可再现研究的数据驱动发现的预测评分

论文标题

可再现研究的数据驱动发现的预测评分

Prediction scoring of data-driven discoveries for reproducible research

论文作者

Smith, Anna L., Zheng, Tian, Gelman, Andrew

论文摘要

预测建模发现有关假设数据生成机制（DGM）的知识和见解。由于模型结果的随机噪声和统计不确定性，很难比较对复杂DGM的不同研究的结果，并使用复杂的模型和算法进行了定量比较。这是行为科学复制危机的主要因素之一。本文的贡献是将预测评分应用于比较两项研究的问题，例如在评估复制或竞争证据时会出现。我们研究了预测模型在定量评估两个数据集之间的一致性中的作用，这些数据集假定来自两个不同的DGM。我们正式使用交叉验证估计的DGM之间的距离。我们认为，所得的预测分数取决于交叉验证创建的预测模型。从这个意义上讲，预测分数沿特定预测模型的维度测量了DGM之间的距离。使用来自实验经济学的人类行为数据，我们证明了预测分数可用于评估预疾病的假设并提供比较来自不同人群和设置的数据的见解。我们使用模拟实验数据检查了预测评分的渐近行为，并证明利用竞争性预测模型可以揭示基础DGM之间的重要差异。我们提出的交叉验证预测分数能够量化未观察到的数据生成机制之间的差异，并允许对复杂模型的结果进行验证和评估。

Predictive modeling uncovers knowledge and insights regarding a hypothesized data generating mechanism (DGM). Results from different studies on a complex DGM, derived from different data sets, and using complicated models and algorithms, are hard to quantitatively compare due to random noise and statistical uncertainty in model results. This has been one of the main contributors to the replication crisis in the behavioral sciences. The contribution of this paper is to apply prediction scoring to the problem of comparing two studies, such as can arise when evaluating replications or competing evidence. We examine the role of predictive models in quantitatively assessing agreement between two datasets that are assumed to come from two distinct DGMs. We formalize a distance between the DGMs that is estimated using cross validation. We argue that the resulting prediction scores depend on the predictive models created by cross validation. In this sense, the prediction scores measure the distance between DGMs, along the dimension of the particular predictive model. Using human behavior data from experimental economics, we demonstrate that prediction scores can be used to evaluate preregistered hypotheses and provide insights comparing data from different populations and settings. We examine the asymptotic behavior of the prediction scores using simulated experimental data and demonstrate that leveraging competing predictive models can reveal important differences between underlying DGMs. Our proposed cross-validated prediction scores are capable of quantifying differences between unobserved data generating mechanisms and allow for the validation and assessment of results from complex models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题