论文标题
通过基于变压器的合奏学习改善癌前病例表征
Improving Precancerous Case Characterization via Transformer-based Ensemble Learning
论文作者
论文摘要
自然语言处理(NLP)在癌症病理学报告中的应用一直集中在检测癌症病例上,在很大程度上忽略了癌前病例。改善癌前腺瘤的表征有助于开发早期癌症检测和预防的诊断测试,尤其是大肠癌(CRC)。在这里,我们开发了基于变压器的深神经网络NLP模型来执行CRC表型,目的是提取癌前病变属性并区分癌症和癌前病例。我们达到了0.914个宏观-F1分数,将患者分类为阴性,非促腺瘤,晚期腺瘤和CRC。我们使用分类器组合的癌症状态分类和病变大小为实体识别(NER)进一步提高了0.923的性能。我们的研究结果表明,使用NLP利用现实世界中的健康记录数据来促进早期癌症预防诊断测试的开发。
The application of natural language processing (NLP) to cancer pathology reports has been focused on detecting cancer cases, largely ignoring precancerous cases. Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention, especially for colorectal cancer (CRC). Here we developed transformer-based deep neural network NLP models to perform the CRC phenotyping, with the goal of extracting precancerous lesion attributes and distinguishing cancer and precancerous cases. We achieved 0.914 macro-F1 scores for classifying patients into negative, non-advanced adenoma, advanced adenoma and CRC. We further improved the performance to 0.923 using an ensemble of classifiers for cancer status classification and lesion size named entity recognition (NER). Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.