SSIT：糖尿病性视网膜病等级的显着引导的自我监督图像变压器

论文标题

SSIT：糖尿病性视网膜病等级的显着引导的自我监督图像变压器

SSiT: Saliency-guided Self-supervised Image Transformer for Diabetic Retinopathy Grading

论文作者

Huang, Yijin, Lyu, Junyan, Cheng, Pujin, Tam, Roger, Tang, Xiaoying

论文摘要

自我监督的学习（SSL）已通过利用未标记的图像来广泛应用于学习图像表示。但是，在医学图像分析字段中尚未完全探索它。在这项工作中，提出了显着引导的自我监督图像变压器（SSIT），以从眼底图像中进行糖尿病性视网膜病变（DR）分级。我们在小新颖的情况下将显着性图引入SSL，目的是指导自我监督的预训练，并具有特定于领域的先验知识。具体而言，在SSIT中采用了两个显着性学习任务：（1）根据动量对比度进行显着性对比度学习，其中利用底底图像的显着性图来从动量升级的密钥编码器的输入序列中去除琐碎的斑块。因此，将密钥编码器受到限制，以提供针对明显区域的目标表示形式，从而指导查询编码器以捕获显着特征。（2）对查询编码器进行了训练，以预测显着性分割，鼓励在学习表示中保存细粒度的信息。为了评估我们提出的方法，采用了四个可公开访问的底面图像数据集。一个数据集用于预训练，而另外三个数据集则用于评估预训练模型在下游DR分级上的性能。所提出的SSIT在所有下游数据集和各种评估设置下的其他代表性最先进的SSL方法显着优于其他代表性的先进SSL方法。例如，在微调评估下，SSIT在DDR数据集上的KAPPA得分为81.88％，表现优于所有其他基于VIT的SSL方法至少9.48％。

Self-supervised Learning (SSL) has been widely applied to learn image representations through exploiting unlabeled images. However, it has not been fully explored in the medical image analysis field. In this work, Saliency-guided Self-Supervised image Transformer (SSiT) is proposed for Diabetic Retinopathy (DR) grading from fundus images. We novelly introduce saliency maps into SSL, with a goal of guiding self-supervised pre-training with domain-specific prior knowledge. Specifically, two saliency-guided learning tasks are employed in SSiT: (1) Saliency-guided contrastive learning is conducted based on the momentum contrast, wherein fundus images' saliency maps are utilized to remove trivial patches from the input sequences of the momentum-updated key encoder. Thus, the key encoder is constrained to provide target representations focusing on salient regions, guiding the query encoder to capture salient features. (2) The query encoder is trained to predict the saliency segmentation, encouraging the preservation of fine-grained information in the learned representations. To assess our proposed method, four publicly-accessible fundus image datasets are adopted. One dataset is employed for pre-training, while the three others are used to evaluate the pre-trained models' performance on downstream DR grading. The proposed SSiT significantly outperforms other representative state-of-the-art SSL methods on all downstream datasets and under various evaluation settings. For example, SSiT achieves a Kappa score of 81.88% on the DDR dataset under fine-tuning evaluation, outperforming all other ViT-based SSL methods by at least 9.48%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题