具有可变形的卷积和注意机制的跨视图图像合成

论文标题

具有可变形的卷积和注意机制的跨视图图像合成

Cross-View Image Synthesis with Deformable Convolution and Attention Mechanism

论文作者

Ding, Hao, Wu, Songsong, Tang, Hao, Wu, Fei, Gao, Guangwei, Jing, Xiao-Yuan

论文摘要

学习生成自然场景一直是计算机视觉中的艰巨任务。在生成截然不同的观点的图像时，这更加艰辛。当视图大不相同时，视图字段几乎没有重叠或对象被阻塞，从而使任务非常具有挑战性。在本文中，我们建议基于可变形的卷积和注意机制使用生成对抗网络（GAN）来解决跨视图图像合成问题（见图1）。从另一种视图中，很难理解和转换场景外观和语义信息，因此我们使用U-NET网络中的变形卷积来提高网络在不同尺度上提取对象的特征的能力。此外，为了更好地了解来自不同视图的图像之间的对应关系，我们应用了一个注意机制来完善中间特征图，从而生成更真实的图像。关于代顿数据集[1]上不同尺寸图像的大量实验表明，我们的模型比最新方法可以产生更好的结果。

Learning to generate natural scenes has always been a daunting task in computer vision. This is even more laborious when generating images with very different views. When the views are very different, the view fields have little overlap or objects are occluded, leading the task very challenging. In this paper, we propose to use Generative Adversarial Networks(GANs) based on a deformable convolution and attention mechanism to solve the problem of cross-view image synthesis (see Fig.1). It is difficult to understand and transform scenes appearance and semantic information from another view, thus we use deformed convolution in the U-net network to improve the network's ability to extract features of objects at different scales. Moreover, to better learn the correspondence between images from different views, we apply an attention mechanism to refine the intermediate feature map thus generating more realistic images. A large number of experiments on different size images on the Dayton dataset[1] show that our model can produce better results than state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题