示例跨任意场景的指导图像合成，使用掩盖的空间通道注意力和自学意义

论文标题

示例跨任意场景的指导图像合成，使用掩盖的空间通道注意力和自学意义

Example-Guided Image Synthesis across Arbitrary Scenes using Masked Spatial-Channel Attention and Self-Supervision

论文作者

Zheng, Haitian, Liao, Haofu, Chen, Lele, Xiong, Wei, Chen, Tianlang, Luo, Jiebo

论文摘要

示例指导的图像合成最近已尝试从语义标签图和示例图像中合成图像。在任务中，附加的示例图像提供了控制合成输出的外观的样式指南。尽管具有可控性优势，但现有模型还是在具有特定且粗略对齐对象的数据集上设计的。在本文中，我们解决了一项更具挑战性和更具挑战性的任务，其中示例是一个任意的场景图像，其语义上与给定标签映射的图像不同。为此，我们首先提出了一个掩盖的空间通道注意（MSCA）模块，该模块通过有效的脱钩注意力对两个任意场景之间的对应关系进行建模。接下来，我们提出了一个用于联合全球和局部特征对齐和综合的端到端网络。最后，我们提出了一项新颖的自学任务，以实现培训。大规模和更多样化的可可固定数据集的实验对现有方法显示出显着改善。此外，我们的方法提供了解释性，并且可以很容易地扩展到其他内容操纵任务，包括样式和空间插值或推断。

Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image. In the task, the additional exemplar image provides the style guidance that controls the appearance of the synthesized output. Despite the controllability advantage, the existing models are designed on datasets with specific and roughly aligned objects. In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map. To this end, we first propose a Masked Spatial-Channel Attention (MSCA) module which models the correspondence between two arbitrary scenes via efficient decoupled attention. Next, we propose an end-to-end network for joint global and local feature alignment and synthesis. Finally, we propose a novel self-supervision task to enable training. Experiments on the large-scale and more diverse COCO-stuff dataset show significant improvements over the existing methods. Moreover, our approach provides interpretability and can be readily extended to other content manipulation tasks including style and spatial interpolation or extrapolation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题