交货：将CNN和变压器组合用于医疗图像分割

论文标题

交货：将CNN和变压器组合用于医疗图像分割

ConvFormer: Combining CNN and Transformer for Medical Image Segmentation

论文作者

Gu, Pengfei, Zhang, Yejia, Wang, Chaoli, Chen, Danny Z.

论文摘要

基于卷积的神经网络（CNN）方法在医学图像分割方面取得了巨大成功，但是由于使用了少量的卷积操作的有效接受领域，它们学习全球表示的能力仍然受到限制。基于变压器的方法能够建模信息的远程依赖性，以捕获全局表示形式，但缺乏对本地上下文进行建模的能力。集成CNN和变压器以学习本地和全球表示，同时探索多尺度功能在进一步改善医疗图像细分方面起到了作用。在本文中，我们提出了一个层次结构的CNN和变压器混合体系结构，称为“交货”，用于医疗图像分割。应变器基于几种简单而有效的设计。（1）重新设计了可变形变压器（Detrans）的馈电模块，以引入局部信息，称为增强的食离pretrans。（2）基于卷积和增强的食离型的残留形杂交茎，以捕获局部和全球表示形式，以增强表示能力。（3）我们的编码器以层次的方式利用残留形的混合词干来生成不同尺度的特征图，并且构建了带有残留连接的其他增强的增强型迪特rans编码器，以利用具有不同标度为输入的特征图的多规模特征。几个数据集的实验表明，我们经过从头开始训练的训练器，优于各种基于CNN或变压器的架构，实现了最新的性能。

Convolutional neural network (CNN) based methods have achieved great successes in medical image segmentation, but their capability to learn global representations is still limited due to using small effective receptive fields of convolution operations. Transformer based methods are capable of modelling long-range dependencies of information for capturing global representations, yet their ability to model local context is lacking. Integrating CNN and Transformer to learn both local and global representations while exploring multi-scale features is instrumental in further improving medical image segmentation. In this paper, we propose a hierarchical CNN and Transformer hybrid architecture, called ConvFormer, for medical image segmentation. ConvFormer is based on several simple yet effective designs. (1) A feed forward module of Deformable Transformer (DeTrans) is re-designed to introduce local information, called Enhanced DeTrans. (2) A residual-shaped hybrid stem based on a combination of convolutions and Enhanced DeTrans is developed to capture both local and global representations to enhance representation ability. (3) Our encoder utilizes the residual-shaped hybrid stem in a hierarchical manner to generate feature maps in different scales, and an additional Enhanced DeTrans encoder with residual connections is built to exploit multi-scale features with feature maps of different scales as input. Experiments on several datasets show that our ConvFormer, trained from scratch, outperforms various CNN- or Transformer-based architectures, achieving state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题