通过全球和局部运动动力学的可控视频生成

论文标题

通过全球和局部运动动力学的可控视频生成

Controllable Video Generation through Global and Local Motion Dynamics

论文作者

Davtyan, Aram, Favaro, Paolo

论文摘要

我们提出玻璃，这是一种全球和局部动作驱动序列合成的方法。玻璃是一种生成模型，以无监督的方式对视频序列进行了训练，并且可以在测试时对输入图像进行动画动画。该方法学会了将框架分为前景 - 背景层，并通过全球和局部动作表示，随着时间的推移而产生前景的过渡。全局动作与2D偏移明确相关，而局部动作则与（几何和光度）局部变形相关。 Glass使用复发性神经网络在帧之间过渡，并通过重建损失进行训练。我们还介绍了W-Sprites（步行精灵），这是一个具有预定义动作空间的新型合成数据集。我们在W-Sprites和真实数据集上评估了我们的方法，并发现Glass能够从单个输入图像中生成现实的视频序列，并成功地学习了比以前的工作更高级的动作空间。

We present GLASS, a method for Global and Local Action-driven Sequence Synthesis. GLASS is a generative model that is trained on video sequences in an unsupervised manner and that can animate an input image at test time. The method learns to segment frames into foreground-background layers and to generate transitions of the foregrounds over time through a global and local action representation. Global actions are explicitly related to 2D shifts, while local actions are instead related to (both geometric and photometric) local deformations. GLASS uses a recurrent neural network to transition between frames and is trained through a reconstruction loss. We also introduce W-Sprites (Walking Sprites), a novel synthetic dataset with a predefined action space. We evaluate our method on both W-Sprites and real datasets, and find that GLASS is able to generate realistic video sequences from a single input image and to successfully learn a more advanced action space than in prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题