估计心脏MRI分割神经网络中的不确定性：一项基准研究

论文标题

估计心脏MRI分割神经网络中的不确定性：一项基准研究

Estimating Uncertainty in Neural Networks for Cardiac MRI Segmentation: A Benchmark Study

论文作者

Ng, Matthew, Guo, Fumin, Biswas, Labonny, Petersen, Steffen E., Piechnik, Stefan K., Neubauer, Stefan, Wright, Graham

论文摘要

目的：卷积神经网络（CNN）在自动心脏磁共振图像分割方面表现出了希望。但是，当在大型现实世界数据集中使用CNN时，量化分割不确定性并确定可能有问题的分割非常重要。在这项工作中，我们对贝叶斯和非乘式方法进行了系统研究，以估计分割神经网络中的不确定性。方法：我们根据分割准确性，概率校准，不确定性图像的不确定性以及分割质量控制，评估了Backprop，Monte Carlo辍学，深层合奏和随机分割网络。结果：我们观察到，除了噪音较重和扭曲模糊的图像外，深层合奏表现优于其他方法。我们表明，Backprop的贝叶斯对噪声扭曲更为强大，而随机分割网络对模糊失真的抵抗力更具。对于分割质量控制，我们表明分割不确定性与所有方法的分割精度相关。通过纳入不确定性估计，我们能够通过标记31---48％最不确定的手动审查分段的31--48％，将较差的细分百分比降低到5％，而无需使用神经网络不确定性（审查75---78％的所有图像））。结论：这项工作提供了对不确定性估计方法的全面评估，并表明，在大多数情况下，深层合奏的表现要优于其他方法。意义：神经网络不确定性度量可以帮助识别细分不准确，并提醒用户手动审查。

Objective: Convolutional neural networks (CNNs) have demonstrated promise in automated cardiac magnetic resonance image segmentation. However, when using CNNs in a large real-world dataset, it is important to quantify segmentation uncertainty and identify segmentations which could be problematic. In this work, we performed a systematic study of Bayesian and non-Bayesian methods for estimating uncertainty in segmentation neural networks. Methods: We evaluated Bayes by Backprop, Monte Carlo Dropout, Deep Ensembles, and Stochastic Segmentation Networks in terms of segmentation accuracy, probability calibration, uncertainty on out-of-distribution images, and segmentation quality control. Results: We observed that Deep Ensembles outperformed the other methods except for images with heavy noise and blurring distortions. We showed that Bayes by Backprop is more robust to noise distortions while Stochastic Segmentation Networks are more resistant to blurring distortions. For segmentation quality control, we showed that segmentation uncertainty is correlated with segmentation accuracy for all the methods. With the incorporation of uncertainty estimates, we were able to reduce the percentage of poor segmentation to 5% by flagging 31--48% of the most uncertain segmentations for manual review, substantially lower than random review without using neural network uncertainty (reviewing 75--78% of all images). Conclusion: This work provides a comprehensive evaluation of uncertainty estimation methods and showed that Deep Ensembles outperformed other methods in most cases. Significance: Neural network uncertainty measures can help identify potentially inaccurate segmentations and alert users for manual review.

下载PDF全文

下载文献需遵守相关版权规定

论文标题