舒适开放：朝向开放域科学主张验证

论文标题

舒适开放：朝向开放域科学主张验证

SciFact-Open: Towards open-domain scientific claim verification

论文作者

Wadden, David, Lo, Kyle, Kuehl, Bailey, Cohan, Arman, Beltagy, Iz, Wang, Lucy Lu, Hajishirzi, Hannaneh

论文摘要

尽管对科学索赔验证的研究导致了似乎正在接近人类绩效的强大系统的发展，但这些方法尚未在针对大型科学文献的现实环境中进行测试。但是，转到这种开放域评估设置会带来独特的挑战。特别是，详尽注释所有证据文件是不可行的。在这项工作中，我们介绍了Scifact-Open，这是一种新的测试集，旨在评估500K研究摘要的科学主张验证系统的性能。利用信息检索的汇总技术，我们通过汇总和注释四个最先进的科学主张验证模型的最佳预测来收集科学主张的证据。我们发现，在较小的Corpora上开发的系统努力概括为开放，表现至少15 f1。此外，对依昔毛的证据的分析揭示了在实践中部署索赔验证系统时可能出现的有趣现象，例如，证据仅支持索赔的特殊情况。我们的数据集可从https://github.com/dwadden/scifact-open获得。

While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature. Moving to this open-domain evaluation setting, however, poses unique challenges; in particular, it is infeasible to exhaustively annotate all evidence documents. In this work, we present SciFact-Open, a new test collection designed to evaluate the performance of scientific claim verification systems on a corpus of 500K research abstracts. Drawing upon pooling techniques from information retrieval, we collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models. We find that systems developed on smaller corpora struggle to generalize to SciFact-Open, exhibiting performance drops of at least 15 F1. In addition, analysis of the evidence in SciFact-Open reveals interesting phenomena likely to appear when claim verification systems are deployed in practice, e.g., cases where the evidence supports only a special case of the claim. Our dataset is available at https://github.com/dwadden/scifact-open.

下载PDF全文

下载文献需遵守相关版权规定

论文标题