MEW-UNET：医疗图像分割的频域中的多轴表示学习

论文标题

MEW-UNET：医疗图像分割的频域中的多轴表示学习

MEW-UNet: Multi-axis representation learning in frequency domain for medical image segmentation

论文作者

Ruan, Jiacheng, Xie, Mingye, Xiang, Suncheng, Liu, Ting, Fu, Yuzhuo

论文摘要

最近，由于在空间域中应用自我发项机制来对全球知识进行建模，视觉变压器（VIT）已被广泛用于计算机视觉的各个领域。尤其是在医学图像细分（MIS）中，许多作品专门用于将VIT和CNN结合起来，甚至有些作品也直接利用了基于纯VIT的模型。但是，最近的工作改善了空间域方面的模型，同时忽略了频域信息的重要性。因此，我们通过用我们的多轴外部权重块在VIT中替换VIT的自我注意力来提出基于U形架构的多轴外部权重（MEW-UNET）。具体而言，我们的块在输入特征的三个轴上执行傅立叶变换，并在频域中分配外部权重，这是由我们的权重生成器生成的。然后，执行逆傅里叶变换以将功能更改回空间域。我们在四个数据集上评估我们的模型，并实现最先进的性能。特别是，在Synapse数据集上，我们的方法以HD95的形式优于10.15mm的MT-UNET。代码可在https://github.com/jcruan519/mew-unet上找到。

Recently, Visual Transformer (ViT) has been widely used in various fields of computer vision due to applying self-attention mechanism in the spatial domain to modeling global knowledge. Especially in medical image segmentation (MIS), many works are devoted to combining ViT and CNN, and even some works directly utilize pure ViT-based models. However, recent works improved models in the aspect of spatial domain while ignoring the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) for MIS based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input feature and assigns the external weight in the frequency domain, which is generated by our Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets and achieve state-of-the-art performances. In particular, on the Synapse dataset, our method outperforms MT-UNet by 10.15mm in terms of HD95. Code is available at https://github.com/JCruan519/MEW-UNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题