论文标题
增量变压器结构增强的图像通过掩盖位置编码
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
论文作者
论文摘要
近年来,图像介绍已取得了重大进步。但是,恢复具有生动纹理和合理结构的损坏的图像仍然具有挑战性。由于卷积神经网络(CNN)的受体有限,因此某些特定方法仅处理规则纹理,而失去整体结构。另一方面,基于注意力的模型可以学习更好的远程依赖,以恢复结构,但是它们受到大量图像大小的重大计算的限制。为了解决这些问题,我们建议利用额外的结构修复程序来促进图像逐渐插入。提出的模型在固定的低分辨率草图空间中使用强大的基于注意力的变压器模型恢复了整体图像结构。这样的灰度空间很容易被更大的尺度采样以传达正确的结构信息。我们的结构修复器可以与其他估计的涂漆模型一起与零定位的残留添加有效地集成。此外,利用掩盖位置编码策略来改善大型不规则面具的性能。与其他竞争对手相比,各种数据集上的广泛实验验证了我们的模型的功效。我们的代码在https://github.com/dqiaole/zits_inpainting中发布。
Image inpainting has made significant advances in recent years. However, it is still challenging to recover corrupted images with both vivid textures and reasonable structures. Some specific methods only tackle regular textures while losing holistic structures due to the limited receptive fields of convolutional neural networks (CNNs). On the other hand, attention-based models can learn better long-range dependency for the structure recovery, but they are limited by the heavy computation for inference with large image sizes. To address these issues, we propose to leverage an additional structure restorer to facilitate the image inpainting incrementally. The proposed model restores holistic image structures with a powerful attention-based transformer model in a fixed low-resolution sketch space. Such a grayscale space is easy to be upsampled to larger scales to convey correct structural information. Our structure restorer can be integrated with other pretrained inpainting models efficiently with the zero-initialized residual addition. Furthermore, a masking positional encoding strategy is utilized to improve the performance with large irregular masks. Extensive experiments on various datasets validate the efficacy of our model compared with other competitors. Our codes are released in https://github.com/DQiaole/ZITS_inpainting.