T-NER：一个全面的Python库，用于基于变压器的命名实体识别

论文标题

T-NER：一个全面的Python库，用于基于变压器的命名实体识别

T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition

论文作者

Ushio, Asahi, Camacho-Collados, Jose

论文摘要

语言模型（LM）预处理导致许多NLP下游任务（包括命名实体识别（NER））的一致改进。在本文中，我们提出了T-NER（基于变压器的名称实体识别），这是一个用于NER LM FINETUNININing的Python库。除实际效用外，T-NER还促进了对NER上LMS的跨域和跨语性概括能力的研究和研究。我们的图书馆还提供了一个Web应用程序，用户可以在其中交互方式获得模型预测，以促进非专家程序员的定性模型评估。我们通过将九个公共数据集编译为统一格式，并评估整个数据集的跨域和跨语性性能，从而展示了库的潜力。我们最初的实验的结果表明，内域性能通常在数据集中具有竞争力。但是，即使经过验证的LM，跨域的概括也具有挑战性，但是，如果在合并的数据集中进行了微调，它仍然具有学习域特异性功能的能力。为了促进未来的研究，我们还通过拥抱面型枢纽发布了所有LM检查站。

Language model (LM) pretraining has led to consistent improvements in many NLP downstream tasks, including named entity recognition (NER). In this paper, we present T-NER (Transformer-based Named Entity Recognition), a Python library for NER LM finetuning. In addition to its practical utility, T-NER facilitates the study and investigation of the cross-domain and cross-lingual generalization ability of LMs finetuned on NER. Our library also provides a web app where users can get model predictions interactively for arbitrary text, which facilitates qualitative model evaluation for non-expert programmers. We show the potential of the library by compiling nine public NER datasets into a unified format and evaluating the cross-domain and cross-lingual performance across the datasets. The results from our initial experiments show that in-domain performance is generally competitive across datasets. However, cross-domain generalization is challenging even with a large pretrained LM, which has nevertheless capacity to learn domain-specific features if fine-tuned on a combined dataset. To facilitate future research, we also release all our LM checkpoints via the Hugging Face model hub.

下载PDF全文

下载文献需遵守相关版权规定

论文标题