从平面监督中识别嵌套实体：新的NER子任务，可行性和挑战

论文标题

从平面监督中识别嵌套实体：新的NER子任务，可行性和挑战

Recognizing Nested Entities from Flat Supervision: A New NER Subtask, Feasibility and Challenges

论文作者

Zhu, Enwei, Liu, Yiyang, Jin, Ming, Li, Jinpeng

论文摘要

许多最新的命名实体识别（NER）研究批评了Flat Ner的非重叠假设，并切换到研究嵌套的NER。但是，现有的嵌套NER模型在很大程度上依赖于用嵌套实体注释的训练数据，同时标记此类数据成本很高。这项研究提出了一个新的子任务，即嵌套 - 弗拉特ner，它与现实的应用程序方案相对应：仅用平面实体注释的数据，人们仍然可能希望训练有素的模型能够识别嵌套实体。为了解决此任务，我们训练基于跨度的模型，并故意忽略嵌套在标记的实体内部的跨度，因为这些跨度可能是未标记的实体。从训练数据中删除嵌套实体后，我们的模型在ACE 2004，ACE 2005和GENIA的实体内的跨度子集上获得了54.8％，54.2％和41.1％的F1分数。这表明了我们的方法的有效性和任务的可行性。此外，该模型在平面实体上的性能完全不受影响。我们进一步在Conll 2003的测试集中手动注释了嵌套实体，创建了一个嵌套的flom-flat ner基准测试。分析结果表明，主要挑战源于数据和嵌套实体之间的注释不一致。

Many recent named entity recognition (NER) studies criticize flat NER for its non-overlapping assumption, and switch to investigating nested NER. However, existing nested NER models heavily rely on training data annotated with nested entities, while labeling such data is costly. This study proposes a new subtask, nested-from-flat NER, which corresponds to a realistic application scenario: given data annotated with flat entities only, one may still desire the trained model capable of recognizing nested entities. To address this task, we train span-based models and deliberately ignore the spans nested inside labeled entities, since these spans are possibly unlabeled entities. With nested entities removed from the training data, our model achieves 54.8%, 54.2% and 41.1% F1 scores on the subset of spans within entities on ACE 2004, ACE 2005 and GENIA, respectively. This suggests the effectiveness of our approach and the feasibility of the task. In addition, the model's performance on flat entities is entirely unaffected. We further manually annotate the nested entities in the test set of CoNLL 2003, creating a nested-from-flat NER benchmark. Analysis results show that the main challenges stem from the data and annotation inconsistencies between the flat and nested entities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题