元体：通过暂时性元学习的几乎没有拍摄的舞蹈视频重新定位

论文标题

元体：通过暂时性元学习的几乎没有拍摄的舞蹈视频重新定位

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning

论文作者

Ge, Yuying, Song, Yibing, Zhang, Ruimao, Luo, Ping

论文摘要

舞蹈视频重新定位旨在综合一个视频，该视频将舞蹈运动从源视频转移到目标人。以前的工作需要收集一个长达长时间的视频，其中一个具有数千帧的目标人来培训个性化模型。但是，训练有素的模型只能生成同一个人的视频。为了解决局限性，最近的工作解决了很少的舞蹈视频重新定位，该视频重新定位通过利用几帧来综合看不见的人的视频。在实践中，鉴于一个人的几帧，这些工作只是将它们视为一批没有时间相关的单个图像，从而产生了视觉质量低的时间不连贯的舞蹈视频。在这项工作中，我们将一个人的几个帧建模为一系列舞蹈动作，其中每个动作都包含两个连续的帧，以提取该人的外观模式和时间动力学。我们提出了利用时间感知的元学习来通过综合舞蹈动作来优化模型的初始化，从而使元训练的模型可以有效地调整为增强的视觉质量和增强的时间稳定性，从而增强了与几个框架的人，可以有效地调整元训练的模型。广泛的评估表明我们方法的优势很高。

Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person. Previous work need collect a several-minute-long video of a target person with thousands of frames to train a personalized model. However, the trained model can only generate videos of the same person. To address the limitations, recent work tackled few-shot dancing video retargeting, which learns to synthesize videos of unseen persons by leveraging a few frames of them. In practice, given a few frames of a person, these work simply regarded them as a batch of individual images without temporal correlations, thus generating temporally incoherent dancing videos of low visual quality. In this work, we model a few frames of a person as a series of dancing moves, where each move contains two consecutive frames, to extract the appearance patterns and the temporal dynamics of this person. We propose MetaDance, which utilizes temporal-aware meta-learning to optimize the initialization of a model through the synthesis of dancing moves, such that the meta-trained model can be efficiently tuned towards enhanced visual quality and strengthened temporal stability for unseen persons with a few frames. Extensive evaluations show large superiority of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题