论文标题
使用胸部X射线图像的COVID-19分类中模型内变异性
Intra-model Variability in COVID-19 Classification Using Chest X-ray Images
论文作者
论文摘要
自冠状病毒大流行以来,用于COVID-19筛查的X射线和计算机断层扫描(CT)扫描技术已在AI研究中获得了显着的吸引力。尽管在19009年筛选方面取得了持续的进步,但在临床环境中使用时仍然存在许多关注模型可靠性。已经发布了很多,但预期模型性能的透明度有限。我们着手通过一组实验来解决这一限制,以量化基线性能指标和胸部X射线中COVID-19检测的可变性,用于12种常见的深度学习体系结构。具体而言,我们采用了一个实验范式控制火车验证测试拆分和模型体系结构,其中预测变异性源于模型权重初始化,随机数据增强转换和批次改组。在Cohen等人提供的公开可用的X射线图像数据集的相同火车验证测试拆分上,每个模型架构都经过5次分开训练。 (2020)。结果表明,即使在模型体系结构中,模型行为在训练有素的模型之间也以有意义的方式变化。最佳性能模型的假期为20中的3分中的3分,用于在持有集合中检测COVID-19。尽管这些结果显示出对使用AI进行Covid-19筛查的希望,但它们进一步支持了对模型培训的各种医学成像数据集的迫切需求,以产生一致的预测结果。我们希望这些建模结果可以加快为Covid-19构建更强大的数据集和可行的筛选工具的工作。
X-ray and computed tomography (CT) scanning technologies for COVID-19 screening have gained significant traction in AI research since the start of the coronavirus pandemic. Despite these continuous advancements for COVID-19 screening, many concerns remain about model reliability when used in a clinical setting. Much has been published, but with limited transparency in expected model performance. We set out to address this limitation through a set of experiments to quantify baseline performance metrics and variability for COVID-19 detection in chest x-ray for 12 common deep learning architectures. Specifically, we adopted an experimental paradigm controlling for train-validation-test split and model architecture where the source of prediction variability originates from model weight initialization, random data augmentation transformations, and batch shuffling. Each model architecture was trained 5 separate times on identical train-validation-test splits of a publicly available x-ray image dataset provided by Cohen et al. (2020). Results indicate that even within model architectures, model behavior varies in a meaningful way between trained models. Best performing models achieve a false negative rate of 3 out of 20 for detecting COVID-19 in a hold-out set. While these results show promise in using AI for COVID-19 screening, they further support the urgent need for diverse medical imaging datasets for model training in a way that yields consistent prediction outcomes. It is our hope that these modeling results accelerate work in building a more robust dataset and a viable screening tool for COVID-19.