论文标题

不要忽略显而易见的:关于单词在单词意义上的作用

Don't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation

论文作者

Loureiro, Daniel, Camacho-Collados, Jose

论文摘要

单词感官歧义(WSD)的最新方法结合了两个不同的特征:预训练的语言模型的力量和扩展此类模型覆盖范围的传播方法。需要这种传播是因为当前的有意义的corpora缺乏对基本意义库存(通常是WordNet)的许多实例的覆盖范围。同时,明确的单词使WordNet中的所有单词中很大一部分,同时在现有的有意识的语料库中覆盖不良。在本文中,我们提出了一种简单的方法,可以为大型语料库中大多数明确单词提供注释。我们介绍了UWA(明确的单词注释)数据集,并展示了最先进的基于繁殖的模型如何使用它来扩展其单词sense嵌入的覆盖范围和质量,从而有很大的利润,从而改善了其对WSD的原始结果。

State-of-the-art methods for Word Sense Disambiguation (WSD) combine two different features: the power of pre-trained language models and a propagation method to extend the coverage of such models. This propagation is needed as current sense-annotated corpora lack coverage of many instances in the underlying sense inventory (usually WordNet). At the same time, unambiguous words make for a large portion of all words in WordNet, while being poorly covered in existing sense-annotated corpora. In this paper, we propose a simple method to provide annotations for most unambiguous words in a large corpus. We introduce the UWA (Unambiguous Word Annotations) dataset and show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings by a significant margin, improving on its original results on WSD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源