自动化事实检查跨主题的值得检查的索赔检测

论文标题

自动化事实检查跨主题的值得检查的索赔检测

Check-worthy Claim Detection across Topics for Automated Fact-checking

论文作者

Abumansour, Amani S., Zubiaga, Arkaitz

论文摘要

自动化事实检查系统的重要组成部分是索赔检验值检测系统，该系统通过根据需要检查的需要对句子进行优先级来对句子进行排名。尽管进行了大量的研究解决了这项任务，但先前的研究忽略了识别不同主题的值得核对主张的挑战性质。在本文中，我们评估并量化了发现新的，看不见的主题的值得检查主张的挑战。突出了问题后，我们建议ARACWA模型在检测跨主题的值得核对要求时减轻性能恶化。 ARACWA模型可以通过合并两个用于少量学习和数据增强的组件来提高新主题的性能。我们使用由14个不同主题组成的阿拉伯推文的公开可用数据集，我们证明我们提出的数据增强策略在整个主题之间取得了重大改进，在此方面的改进程度各不相同。此外，我们分析了主题之间的语义相似性，这表明相似度度量可以用作代理，以确定在承担标记基本句子的任务之前，在不见题的难度水平上确定了一个看不见的主题的难度。

An important component of an automated fact-checking system is the claim check-worthiness detection system, which ranks sentences by prioritising them based on their need to be checked. Despite a body of research tackling the task, previous research has overlooked the challenging nature of identifying check-worthy claims across different topics. In this paper, we assess and quantify the challenge of detecting check-worthy claims for new, unseen topics. After highlighting the problem, we propose the AraCWA model to mitigate the performance deterioration when detecting check-worthy claims across topics. The AraCWA model enables boosting the performance for new topics by incorporating two components for few-shot learning and data augmentation. Using a publicly available dataset of Arabic tweets consisting of 14 different topics, we demonstrate that our proposed data augmentation strategy achieves substantial improvements across topics overall, where the extent of the improvement varies across topics. Further, we analyse the semantic similarities between topics, suggesting that the similarity metric could be used as a proxy to determine the difficulty level of an unseen topic prior to undertaking the task of labelling the underlying sentences.

下载PDF全文

下载文献需遵守相关版权规定

论文标题