论文标题

MANER:蒙版增强命名为“极端低资源”语言的实体识别

MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

论文作者

Sonkar, Shashank, Wang, Zichao, Baraniuk, Richard G.

论文摘要

本文研究了仅数百个标记的数据样本的极端低资源语言的指定实体识别(NER)的问题。 NER是自然语言处理(NLP)的基本任务。一个关键的驱动程序加速系统的进步是存在大规模语言语料库,它使NER系统能够在具有丰富的培训数据的英语和法语等语言中实现出色的性能。但是,低资源语言的NER仍然相对尚未探索。在本文中,我们引入了蒙版增强命名实体识别(MANER),这是一种新方法,利用了NER预先训练的蒙版语言模型(MLMS)的分布假设。预训练的MLMS中的<mask>令牌编码有价值的语义上下文信息。曼纳(Maner)重新修复了<mask>令牌以进行NER预测。具体来说,我们将<mask>令牌预示到一个句子中的每个单词,我们想预测命名的实体标签。在培训期间,我们共同微调了MLM和一个附加在每个<mask>令牌上的新的预测头。我们证明了Maner非常适合使用低资源语言的NER;我们的实验表明,对于100种培训示例的100种语言,它的最新方法提高了48%,而F1分数平均提高了12%。我们还进行了详细的分析和消融研究,以了解最适合Maner的情况。

This paper investigates the problem of Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged data samples. NER is a fundamental task in Natural Language Processing (NLP). A critical driver accelerating NER systems' progress is the existence of large-scale language corpora that enable NER systems to achieve outstanding performance in languages such as English and French with abundant training data. However, NER for low-resource languages remains relatively unexplored. In this paper, we introduce Mask Augmented Named Entity Recognition (MANER), a new methodology that leverages the distributional hypothesis of pre-trained masked language models (MLMs) for NER. The <mask> token in pre-trained MLMs encodes valuable semantic contextual information. MANER re-purposes the <mask> token for NER prediction. Specifically, we prepend the <mask> token to every word in a sentence for which we would like to predict the named entity tag. During training, we jointly fine-tune the MLM and a new NER prediction head attached to each <mask> token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on state-of-the-art methods by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best-suited to MANER.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源