为什么我不回答：了解癌症病理报告弃权分类器的分类的决定因素

论文标题

为什么我不回答：了解癌症病理报告弃权分类器的分类的决定因素

Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports

论文作者

Dhaubhadel, Sayera, Mohd-Yusof, Jamaludin, Ganguly, Kumkum, Chennupati, Gopinath, Thulasidasan, Sunil, Hengartner, Nicolas W., Mumphrey, Brent J., Durbin, Eric B., Doherty, Jennifer A., Lemieux, Mireille, Schaefferkoetter, Noah, Tourassi, Georgia, Coyle, Linda, Penberthy, Lynne, McMahon, Benjamin H., Bhattacharya, Tanmoy

论文摘要

在关键现实世界中，深度学习系统的安全部署需要模型几乎没有犯错，仅在可预测的情况下。在这项工作中，我们使用弃权分类器来解决此问题，该分类器的精度> $ 95％，然后使用石灰确定弃弃的决定因素。从本质上讲，我们正在训练我们的模型，以学习可能导致分类不正确的病理报告的属性，尽管其敏感性降低。我们在多任务设置中展示了一个弃权分类器，用于将NCI SEER癌症注册表的癌症病理学报告分类，内容涉及六项感兴趣的任务。对于这些任务，我们通过在报告的25--45％的报告中降低了分类错误率2-5。对于对癌症部位进行分类的特定任务，我们能够鉴定转移，涉及淋巴结的报告以及对多个癌症部位的讨论，以造成许多分类错误，并观察到错误的程度和类型随癌症部位（例如乳腺癌，肺，肺和前列腺）系统变化。在跨三个任务结合时，我们的模型将报告的50％分类为六个任务中的三个\编辑中的三个，而在保留样本上的所有六个任务中，三个都大于95％，大于85％。此外，我们表明石灰比单独出现单词的措施提供了更好的分类决定因素。通过将深度放弃的分类器与使用石灰的特征识别相结合，我们可以在从病理报告中分类癌症部位时确定负责正确性和弃权的概念。石灰比关键字搜索的改善具有统计学意义，这大概是因为在上下文中评估了单词，并被确定为分类的局部决定因素。

Safe deployment of deep learning systems in critical real world applications requires models to make very few mistakes, and only under predictable circumstances. In this work, we address this problem using an abstaining classifier that is tuned to have $>$95% accuracy, and then identify the determinants of abstention using LIME. Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate an abstaining classifier in a multitask setting for classifying cancer pathology reports from the NCI SEER cancer registries on six tasks of interest. For these tasks, we reduce the classification error rate by factors of 2--5 by abstaining on 25--45% of the reports. For the specific task of classifying cancer site, we are able to identify metastasis, reports involving lymph nodes, and discussion of multiple cancer sites as responsible for many of the classification mistakes, and observe that the extent and types of mistakes vary systematically with cancer site (e.g., breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks\edit, and greater than 85% for all six tasks on the retained samples. Furthermore, we show that LIME provides a better determinant of classification than measures of word occurrence alone. By combining a deep abstaining classifier with feature identification using LIME, we are able to identify concepts responsible for both correctness and abstention when classifying cancer sites from pathology reports. The improvement of LIME over keyword searches is statistically significant, presumably because words are assessed in context and have been identified as a local determinant of classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题