长期文档的BERT：自动化ICD编码的案例研究

论文标题

长期文档的BERT：自动化ICD编码的案例研究

BERT for Long Documents: A Case Study of Automated ICD Coding

论文作者

Afkanpour, Arash, Adeel, Shabir, Bassani, Hansenclever, Epshteyn, Arkady, Fan, Hongbo, Jones, Isaac, Malihi, Mahan, Nauth, Adrian, Sinha, Raj, Woonna, Sanjana, Zamani, Shiva, Kanal, Elli, Fomitchev, Mikhail, Cheung, Donny

论文摘要

在许多NLP问题中，变压器模型取得了巨大的成功。但是，先前在自动ICD编码的研究得出的结论是，这些模型无法胜过一些早期的解决方案，例如基于CNN的模型。在本文中，我们挑战了这一结论。我们提出了一种简单且可扩展的方法，可以使用现有的变压器模型（例如BERT）处理长文本。我们表明，该方法显着改善了ICD编码中变压器模型报告的先前结果，并且能够优于基于CNN的突出方法之一。

Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this method significantly improves the previous results reported for transformer models in ICD coding, and is able to outperform one of the prominent CNN-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题