论文标题

探索预训练的语言模型对临床命名实体识别的价值

Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition

论文作者

Belkadi, Samuel, Han, Lifeng, Wu, Yuping, Nenadic, Goran

论文摘要

从一般或特定领域的数据到具有有限资源的特定任务的微调预训练的语言模型(PLM)的实践在自然语言处理(NLP)领域中广受欢迎。在这项工作中,我们重新探讨了这一假设,并在临床NLP中进行了调查,该临床NLP是针对药物及其相关属性的专门称为实体识别的。我们比较了经过从头开始训练的变压器模型与基于Bert Bert的LLMS,即Bert,Biobert和Clinicalbert。此外,我们研究了其他CRF层对此类模型的影响,以鼓励上下文学习。我们将N2C2-2018共享任务数据用于模型开发和评估。实验结果表明1)CRF层改善了所有语言模型; 2)使用宏观平均F1分数指的是生物 - 图片跨度水平评估,尽管微型LLMS的得分达到0.83+,但从SCRATCH中训练的TransformerCRF模型以0.78+的成绩训练,表现出可比的性能,其成本较低,例如。培训参数减少39.80 \%; 3)使用加权平均F1得分,Clinicalbert-CRF,BERT-CRF和TransficeerCRF参考Bio-Strict跨度评估表现出较低的分数差异,分别为97.59 \%/97.44 \%/96.84 \%。 4)通过下采样来应用有效的培训以更好地数据分配进一步降低了培训成本和数据需求,同时保持相似的分数 - 即与使用完整数据集相比,要降低0.02点。我们的模型将托管在\ url {https://github.com/hecta-uom/transformercrf}

The practice of fine-tuning Pre-trained Language Models (PLMs) from general or domain-specific data to a specific task with limited resources, has gained popularity within the field of natural language processing (NLP). In this work, we re-visit this assumption and carry out an investigation in clinical NLP, specifically Named Entity Recognition on drugs and their related attributes. We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs namely BERT, BioBERT, and ClinicalBERT. Furthermore, we examine the impact of an additional CRF layer on such models to encourage contextual learning. We use n2c2-2018 shared task data for model development and evaluations. The experimental outcomes show that 1) CRF layers improved all language models; 2) referring to BIO-strict span level evaluation using macro-average F1 score, although the fine-tuned LLMs achieved 0.83+ scores, the TransformerCRF model trained from scratch achieved 0.78+, demonstrating comparable performances with much lower cost - e.g. with 39.80\% less training parameters; 3) referring to BIO-strict span-level evaluation using weighted-average F1 score, ClinicalBERT-CRF, BERT-CRF, and TransformerCRF exhibited lower score differences, with 97.59\%/97.44\%/96.84\% respectively. 4) applying efficient training by down-sampling for better data distribution further reduced the training cost and need for data, while maintaining similar scores - i.e. around 0.02 points lower compared to using the full dataset. Our models will be hosted at \url{https://github.com/HECTA-UoM/TransformerCRF}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源