论文标题
在涉及命名实体的事实文本中的语义新颖性检测和表征
Semantic Novelty Detection and Characterization in Factual Text Involving Named Entities
论文作者
论文摘要
关于文本新颖性检测的许多现有工作已经在主题层面上进行了研究,即确定文档或句子的主题是否是新颖的。在细粒度的语义级别(或上下文级别)上几乎没有完成工作。例如,鉴于我们知道埃隆·马斯克(Elon Musk)是一家技术公司的首席执行官,因此“埃隆·马斯克(Elon Musk)在情景喜剧中扮演大爆炸理论”的句子是新颖而令人惊讶的,因为通常首席执行官不会成为演员。现有的基于主题的新颖性检测方法在此问题上效果不佳,因为它们不执行涉及文本中指定实体与其背景知识之间关系的语义推理。本文提出了一个有效的模型(称为PAT-SND)来解决该问题,这也可以表征新颖性。还创建了注释的数据集。评估表明,PAT-SN的表现优于10个基线,而大幅度的边缘则优于10个基准。
Much of the existing work on text novelty detection has been studied at the topic level, i.e., identifying whether the topic of a document or a sentence is novel or not. Little work has been done at the fine-grained semantic level (or contextual level). For example, given that we know Elon Musk is the CEO of a technology company, the sentence "Elon Musk acted in the sitcom The Big Bang Theory" is novel and surprising because normally a CEO would not be an actor. Existing topic-based novelty detection methods work poorly on this problem because they do not perform semantic reasoning involving relations between named entities in the text and their background knowledge. This paper proposes an effective model (called PAT-SND) to solve the problem, which can also characterize the novelty. An annotated dataset is also created. Evaluation shows that PAT-SND outperforms 10 baselines by large margins.