与语义匹配的释义建模科学沟通中的信息变化

论文标题

与语义匹配的释义建模科学沟通中的信息变化

Modeling Information Change in Science Communication with Semantically Matched Paraphrases

论文作者

Wright, Dustin, Pei, Jiaxin, Jurgens, David, Augenstein, Isabelle

论文摘要

长期以来，媒体是否忠实传达科学信息一直是科学界的核心问题。自动识别释义的科学发现可以使大规模跟踪和分析科学通信过程中的信息变化，但这需要系统来了解跨多个领域的科学信息之间的相似性。为此，我们介绍了科学释义和信息变更数据集（加香料），这是有关信息变化程度的科学发现的第一个释义数据集。五香包含从新闻报道，社交媒体讨论和原始论文的全文中提取的6,000个科学发现对。我们证明，加香料构成了一项具有挑战性的任务，并且接受调味的训练的模型改善了下游绩效，以检索证据检索现实世界科学主张的事实检查。最后，我们表明，经过加香料培训的模型可以揭示人们和组织忠实地传达新科学发现的程度上的大规模趋势。数据，代码和预培训模型可在http://www.copenlu.com/publication/2022_emnlp_wright/上找到。

Whether the media faithfully communicate scientific information has long been a core issue to the science community. Automatically identifying paraphrased scientific findings could enable large-scale tracking and analysis of information changes in the science communication process, but this requires systems to understand the similarity between scientific information across multiple domains. To this end, we present the SCIENTIFIC PARAPHRASE AND INFORMATION CHANGE DATASET (SPICED), the first paraphrase dataset of scientific findings annotated for degree of information change. SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers. We demonstrate that SPICED poses a challenging task and that models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims. Finally, we show that models trained on SPICED can reveal large-scale trends in the degrees to which people and organizations faithfully communicate new scientific findings. Data, code, and pre-trained models are available at http://www.copenlu.com/publication/2022_emnlp_wright/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题