概括长时间的概念以进行医学术语归一化

论文标题

概括长时间的概念以进行医学术语归一化

Generalizing over Long Tail Concepts for Medical Term Normalization

论文作者

Portelli, Beatrice, Scaboro, Simone, Santus, Enrico, Sedghamiz, Hooman, Chersoni, Emmanuele, Serra, Giuseppe

论文摘要

医学术语归一化包括将一条文本映射到大量输出类。鉴于注释数据集的尺寸很小，并且概念的尾巴分布极长，因此开发能够概括以稀缺或看不见的概念的模型至关重要。大多数目标本体论的一个重要属性是它们的层次结构。在本文中，我们介绍了一种简单有效的学习策略，该策略利用此类信息来增强歧视性和生成模型的普遍性。评估表明，所提出的策略在可见的概念上产生了最新的绩效，并在看不见的概念上进行了一致的改进，还允许在文本类型和数据集跨文本类型和数据集中有效的零照片转移。

Medical term normalization consists in mapping a piece of text to a large number of output classes. Given the small size of the annotated datasets and the extremely long tail distribution of the concepts, it is of utmost importance to develop models that are capable to generalize to scarce or unseen concepts. An important attribute of most target ontologies is their hierarchical structure. In this paper we introduce a simple and effective learning strategy that leverages such information to enhance the generalizability of both discriminative and generative models. The evaluation shows that the proposed strategy produces state-of-the-art performance on seen concepts and consistent improvements on unseen ones, allowing also for efficient zero-shot knowledge transfer across text typologies and datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题