论文标题

使用弱监督和转移学习的自动肥皂分类系统

An Automatic SOAP Classification System Using Weakly Supervision And Transfer Learning

论文作者

Kwon, Sunjae, Yang, Zhichao, Yu, Hong

论文摘要

在本文中,我们介绍了一个综合框架,用于开发基于机器学习的肥皂(主观,客观,评估和计划)分类系统,而无需手动肥皂注释的培训数据或使用较少手动肥皂注释的培训数据。该系统由以下两个部分组成:1)数据构建,2)基于神经网络的肥皂分类器和3)转移学习框架。在数据构建中,由于大型培训数据集的手动构造很昂贵,因此我们提出了一种基于规则的弱标记方法,利用EHR注释的结构化信息。然后,我们提出了一个由预训练的语言模型和带有条件随机场(BI-LSTM-CRF)的双向长期记忆组成的肥皂分类器。最后,我们提出了一个转移学习框架,该框架重新使用了肥皂分类器的训练参数,该参数训练了从另一家医院收集的弱标记的数据集训练的数据集。拟议的基于标签的学习模型成功地对从目标医院收集的注释成功地进行了肥皂分类(89.99 F1得分)。否则,在其他医院和部门收集的笔记中,表现急剧下降。同时,我们验证了转移学习框架对于模型的院间适应是有利的,该模型在每种情况下都会提高模型的性能。特别是,当手动注释的数据大小较小时,转移学习方法更有效。我们表明,使用我们弱标记算法训练的肥皂分类模型可以执行肥皂分类,而无需在同一医院的EHR注释上手动注释数据。转移学习框架有助于肥皂分类模型的院间迁移,而手动注释的数据集的大小最小。

In this paper, we introduce a comprehensive framework for developing a machine learning-based SOAP (Subjective, Objective, Assessment, and Plan) classification system without manually SOAP annotated training data or with less manually SOAP annotated training data. The system is composed of the following two parts: 1) Data construction, 2) A neural network-based SOAP classifier, and 3) Transfer learning framework. In data construction, since a manual construction of a large size training dataset is expensive, we propose a rule-based weak labeling method utilizing the structured information of an EHR note. Then, we present a SOAP classifier composed of a pre-trained language model and bi-directional long-short term memory with conditional random field (Bi-LSTM-CRF). Finally, we propose a transfer learning framework that re-uses the trained parameters of the SOAP classifier trained with the weakly labeled dataset for datasets collected from another hospital. The proposed weakly label-based learning model successfully performed SOAP classification (89.99 F1-score) on the notes collected from the target hospital. Otherwise, in the notes collected from other hospitals and departments, the performance dramatically decreased. Meanwhile, we verified that the transfer learning framework is advantageous for inter-hospital adaptation of the model increasing the models' performance in every cases. In particular, the transfer learning approach was more efficient when the manually annotated data size was smaller. We showed that SOAP classification models trained with our weakly labeling algorithm can perform SOAP classification without manually annotated data on the EHR notes from the same hospital. The transfer learning framework helps SOAP classification model's inter-hospital migration with a minimal size of the manually annotated dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源