论文标题
临床长形和临床 - 木鸟:长临床序列的变压器
Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences
论文作者
论文摘要
基于变形金刚的模型,例如BERT,已大大提高了各种自然语言处理任务的性能。临床知识丰富的模型,即Clinicalbert,当对临床命名实体识别和自然语言推理任务进行时,也获得了最先进的结果。这些变压器的核心局限性之一是由于其完整的自我注意力机制,大量的记忆消耗。为了克服这一点,长序列变压器模型,例如提出了长形和大鸟,它是提出稀疏注意机制的想法,以将记忆使用率从二次到序列长度减少到线性尺度。这些模型将最大输入序列长度从512扩展到4096,从而增强了建模长期依赖性并因此在各种任务中实现最佳结果的能力。受这些长序列变压器模型成功的启发,我们引入了两个富含域的语言模型,即临床延长型和临床木数,这些模型是从大规模临床语料库中预先训练的。我们使用10个基线任务(包括命名实体识别,问答和文档分类任务)评估了两个预训练的模型。结果表明,在所有下游任务中,临床延伸器和临床鸟类始终如一,显着超过了临床伯特以及其他短期变压器。我们已经在[https://github.com/luoyuanlab/clinical-longformer]上提供了我们的源代码。
Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made our source code available at [https://github.com/luoyuanlab/Clinical-Longformer] the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-Longformer].