通过自适应生成$ \ leftrightArrow $保存图像内容，朝着照片真实的虚拟尝试

论文标题

通过自适应生成$ \ leftrightArrow $保存图像内容，朝着照片真实的虚拟尝试

Towards Photo-Realistic Virtual Try-On by Adaptively Generating$\leftrightarrow$Preserving Image Content

论文作者

Yang, Han, Zhang, Ruimao, Guo, Xiaobao, Liu, Wei, Zuo, Wangmeng, Luo, Ping

论文摘要

图像视觉试验旨在将目标服装图像转移到参考人员身上，并且近年来已成为一个热门话题。先前的艺术通常着重于保留服装图像（例如纹理，徽标，刺绣）的特征，将其翘曲到任意人的姿势。但是，当参考人员中出现大型遮挡和人类姿势时，生成光真实的尝试图像仍然是一个巨大的挑战。为了解决这个问题，我们提出了一个新颖的视觉尝试网络，即自适应内容生成和保存网络（ACGPN）。尤其是，ACGPN首先预测了参考图像的语义布局，该图像将在尝试后将更改（例如长袖衬衫$ \ rightarrow $臂，ARM $ \ rightArrow $夹克），然后确定是否需要根据预测的语义布局来生成或保留其图像内容，从而导致照片 - 真实性的穿衣和丰富的服装和丰富的服装细节。 ACGPN通常涉及三个主要模块。首先，语义布局生成模块利用参考图像的语义分割来逐步预测尝试后所需的语义布局。其次，根据生成的语义布局，衣服翘曲的模块扭曲了服装图像，其中引入了二阶差异约束，以稳定训练过程中的翘曲过程。第三，用于内容融合的介绍模块集成了所有信息（例如参考图像，语义布局，扭曲的衣服），以适应人体的每个语义部分。与最先进的方法相比，ACGPN可以生成具有更好的感知质量和更丰富的细节的照片真实图像。

Image visual try-on aims at transferring a target clothing image onto a reference person, and has become a hot topic in recent years. Prior arts usually focus on preserving the character of a clothing image (e.g. texture, logo, embroidery) when warping it to arbitrary human pose. However, it remains a big challenge to generate photo-realistic try-on images when large occlusions and human poses are presented in the reference person. To address this issue, we propose a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN). In particular, ACGPN first predicts semantic layout of the reference image that will be changed after try-on (e.g. long sleeve shirt$\rightarrow$arm, arm$\rightarrow$jacket), and then determines whether its image content needs to be generated or preserved according to the predicted semantic layout, leading to photo-realistic try-on and rich clothing details. ACGPN generally involves three major modules. First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on. Second, a clothes warping module warps clothing images according to the generated semantic layout, where a second-order difference constraint is introduced to stabilize the warping process during training. Third, an inpainting module for content fusion integrates all information (e.g. reference image, semantic layout, warped clothes) to adaptively produce each semantic part of human body. In comparison to the state-of-the-art methods, ACGPN can generate photo-realistic images with much better perceptual quality and richer fine-details.

下载PDF全文

下载文献需遵守相关版权规定

论文标题