带有视觉变压器的细粒度图像样式转移

论文标题

带有视觉变压器的细粒度图像样式转移

Fine-Grained Image Style Transfer with Visual Transformers

论文作者

Wang, Jianbo, Yang, Huan, Fu, Jianlong, Yamasaki, Toshihiko, Guo, Baining

论文摘要

随着卷积神经网络的发展，图像样式转移引起了人们的关注。但是，大多数现有方法采用全局特征转换，将样式模式传输到内容图像（例如ADAIN和WCT）中。这样的设计通常会破坏输入图像的空间信息，并且无法将细颗粒样式模式转移到样式转移结果中。为了解决这个问题，我们提出了一个新型的样式变压器（STTR）网络，该网络将内容和样式图像分解为视觉令牌，以实现精细的风格变换。具体来说，我们的STTR采用了两种注意力机制。我们首先建议使用自我注意来编码内容和样式令牌，以便可以将类似的令牌分组和学习。然后，我们采用内容和样式令牌之间的跨注意，从而鼓励细粒度的风格转换。为了将STTR与现有方法进行比较，我们对亚马逊机械土耳其人（AMT）进行用户研究，这些用户研究总共有50名人类受试者，总共有1,000票。广泛的评估证明了拟议的STTR在产生视觉令人愉悦的样式转移结果方面的有效性和效率。

With the development of the convolutional neural network, image style transfer has drawn increasing attention. However, most existing approaches adopt a global feature transformation to transfer style patterns into content images (e.g., AdaIN and WCT). Such a design usually destroys the spatial information of the input images and fails to transfer fine-grained style patterns into style transfer results. To solve this problem, we propose a novel STyle TRansformer (STTR) network which breaks both content and style images into visual tokens to achieve a fine-grained style transformation. Specifically, two attention mechanisms are adopted in our STTR. We first propose to use self-attention to encode content and style tokens such that similar tokens can be grouped and learned together. We then adopt cross-attention between content and style tokens that encourages fine-grained style transformations. To compare STTR with existing approaches, we conduct user studies on Amazon Mechanical Turk (AMT), which are carried out with 50 human subjects with 1,000 votes in total. Extensive evaluations demonstrate the effectiveness and efficiency of the proposed STTR in generating visually pleasing style transfer results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题