大型语言模型是很少的临床信息提取器

论文标题

大型语言模型是很少的临床信息提取器

Large Language Models are Few-Shot Clinical Information Extractors

论文作者

Agrawal, Monica, Hegselmann, Stefan, Lang, Hunter, Kim, Yoon, Sontag, David

论文摘要

临床NLP社区的长期目标是提取被困在临床注释中的重要变量。但是，障碍包括从一般领域的数据集转移以及缺乏公共临床语料库和注释。在这项工作中，我们表明，尽管没有专门针对临床领域的培训，但在零和很少的信息中表现出色的大型语言模型，在零和很少的信息中表现良好。尽管在此类模型中已经对文本分类和发电性能进行了广泛的研究，但在这里我们还展示了如何利用它们来解决一套需要更多结构化输出的NLP任务，包括跨度识别，令牌级序列分类和关系提取。此外，由于缺乏评估这些系统的可用数据，我们介绍了基于CASI数据集的手动重新注释以进行新任务的新数据集，以基准进行少量临床信息提取。在我们研究的临床提取任务上，GPT-3系统的表现明显优于现有的零和少量基线。

A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT, perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and generation performance have already been studied extensively in such models, here we additionally demonstrate how to leverage them to tackle a diverse set of NLP tasks which require more structured outputs, including span identification, token-level sequence classification, and relation extraction. Further, due to the dearth of available data to evaluate these systems, we introduce new datasets for benchmarking few-shot clinical information extraction based on a manual re-annotation of the CASI dataset for new tasks. On the clinical extraction tasks we studied, the GPT-3 systems significantly outperform existing zero- and few-shot baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题