图像重建的图像到图像MLP混合

论文标题

图像重建的图像到图像MLP混合

Image-to-Image MLP-mixer for Image Reconstruction

论文作者

Mansour, Youssef, Lin, Kang, Heckel, Reinhard

论文摘要

神经网络是用于图像重建问题（例如denoising和压缩感应）的高效工具。迄今为止，用于图像重建的神经网络几乎完全卷积。最受欢迎的体系结构是U-NET，这是一个具有多分辨率体系结构的卷积网络。在这项工作中，我们显示了一个基于多层感知器（MLP）的简单网络，如果训练集和网络的大小中等大型，则可以无需卷积，而没有多分辨率体系结构，可以实现最先进的图像重建性能。与原始的MLP混合仪类似，图像到图像MLP-Mixer仅基于在线性转换的图像贴片上运行的MLP。与原始的MLP混合机相反，我们通过保留图像贴片的相对位置来结合结构。这对自然图像施加了感应偏见，这使图像到图像MLP混合仪能够学会基于示例比原始MLP混合量更少的示例来降低图像。此外，与U-NET及其参数相比，图像到图像的MLP混合仪需要更少的参数才能实现相同的去签名性能，而其参数则在图像分辨率中线性地缩放，而不是四边形，而不是原始的MLP混合使用。如果以适度的示例进行训练，则图像对MLP搅拌机的表现会略有优于U-NET。它还优于为图像重建和经典未经训练的方法（例如BM3D）量身定制的视觉变压器，这使其成为图像重建问题的非常有效的工具。

Neural networks are highly effective tools for image reconstruction problems such as denoising and compressive sensing. To date, neural networks for image reconstruction are almost exclusively convolutional. The most popular architecture is the U-Net, a convolutional network with a multi-resolution architecture. In this work, we show that a simple network based on the multi-layer perceptron (MLP)-mixer enables state-of-the art image reconstruction performance without convolutions and without a multi-resolution architecture, provided that the training set and the size of the network are moderately large. Similar to the original MLP-mixer, the image-to-image MLP-mixer is based exclusively on MLPs operating on linearly-transformed image patches. Contrary to the original MLP-mixer, we incorporate structure by retaining the relative positions of the image patches. This imposes an inductive bias towards natural images which enables the image-to-image MLP-mixer to learn to denoise images based on fewer examples than the original MLP-mixer. Moreover, the image-to-image MLP-mixer requires fewer parameters to achieve the same denoising performance than the U-Net and its parameters scale linearly in the image resolution instead of quadratically as for the original MLP-mixer. If trained on a moderate amount of examples for denoising, the image-to-image MLP-mixer outperforms the U-Net by a slight margin. It also outperforms the vision transformer tailored for image reconstruction and classical un-trained methods such as BM3D, making it a very effective tool for image reconstruction problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题