在低资源限制下，使用瓶颈适配器在临床笔记中鉴定癌症

论文标题

在低资源限制下，使用瓶颈适配器在临床笔记中鉴定癌症

Using Bottleneck Adapters to Identify Cancer in Clinical Notes under Low-Resource Constraints

论文作者

Rohanian, Omid, Jauncey, Hannah, Nouriborji, Mohammadmahdi, Chauhan, Vinod Kumar, Gonçalves, Bronner P., Kartsonaki, Christiana, Group, ISARIC Clinical Characterisation, Merson, Laura, Clifton, David

论文摘要

锁定在临床健康记录中的处理信息是一项具有挑战性的任务，它仍然是生物医学NLP研究的积极研究领域。在这项工作中，我们评估了一组广泛的机器学习技术，从简单的RNN到包含临床注释的数据集中的Biobert等专业变压器，以及一组注释，指示样品是否与癌症相关。此外，我们专门采用了NLP的有效微调方法，即瓶颈适配器和及时的调整，以使模型适应我们的专业任务。我们的评估表明，通过自然语言预先培训的冷冻BERT模型进行微调，并且瓶颈适配器的表现优于所有其他策略，包括对专业Biobert模型的全面微调。根据我们的发现，我们建议在低资源情况下使用瓶颈适配器，访问标记的数据或处理能力有限可能是生物医学文本挖掘的可行策略。实验中使用的代码将在https://github.com/omidrohanian/bottleneck-adapters上提供。

Processing information locked within clinical health records is a challenging task that remains an active area of research in biomedical NLP. In this work, we evaluate a broad set of machine learning techniques ranging from simple RNNs to specialised transformers such as BioBERT on a dataset containing clinical notes along with a set of annotations indicating whether a sample is cancer-related or not. Furthermore, we specifically employ efficient fine-tuning methods from NLP, namely, bottleneck adapters and prompt tuning, to adapt the models to our specialised task. Our evaluations suggest that fine-tuning a frozen BERT model pre-trained on natural language and with bottleneck adapters outperforms all other strategies, including full fine-tuning of the specialised BioBERT model. Based on our findings, we suggest that using bottleneck adapters in low-resource situations with limited access to labelled data or processing capacity could be a viable strategy in biomedical text mining. The code used in the experiments are going to be made available at https://github.com/omidrohanian/bottleneck-adapters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题