论文标题

T-NER:一个全面的Python库,用于基于变压器的命名实体识别

T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition

论文作者

Ushio, Asahi, Camacho-Collados, Jose

论文摘要

语言模型(LM)预处理导致许多NLP下游任务(包括命名实体识别(NER))的一致改进。在本文中,我们提出了T-NER(基于变压器的名称实体识别),这是一个用于NER LM FINETUNININing的Python库。除实际效用外,T-NER还促进了对NER上LMS的跨域和跨语性概括能力的研究和研究。我们的图书馆还提供了一个Web应用程序,用户可以在其中交互方式获得模型预测,以促进非专家程序员的定性模型评估。我们通过将九个公共数据集编译为统一格式,并评估整个数据集的跨域和跨语性性能,从而展示了库的潜力。我们最初的实验的结果表明,内域性能通常在数据集中具有竞争力。但是,即使经过验证的LM,跨域的概括也具有挑战性,但是,如果在合并的数据集中进行了微调,它仍然具有学习域特异性功能的能力。为了促进未来的研究,我们还通过拥抱面型枢纽发布了所有LM检查站。

Language model (LM) pretraining has led to consistent improvements in many NLP downstream tasks, including named entity recognition (NER). In this paper, we present T-NER (Transformer-based Named Entity Recognition), a Python library for NER LM finetuning. In addition to its practical utility, T-NER facilitates the study and investigation of the cross-domain and cross-lingual generalization ability of LMs finetuned on NER. Our library also provides a web app where users can get model predictions interactively for arbitrary text, which facilitates qualitative model evaluation for non-expert programmers. We show the potential of the library by compiling nine public NER datasets into a unified format and evaluating the cross-domain and cross-lingual performance across the datasets. The results from our initial experiments show that in-domain performance is generally competitive across datasets. However, cross-domain generalization is challenging even with a large pretrained LM, which has nevertheless capacity to learn domain-specific features if fine-tuned on a combined dataset. To facilitate future research, we also release all our LM checkpoints via the Hugging Face model hub.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源