半监督视频与周期一致性约束

论文标题

半监督视频与周期一致性约束

Semi-Supervised Video Inpainting with Cycle Consistency Constraints

论文作者

Wu, Zhiliang, Xuan, Hanyu, Sun, Changchang, Zhang, Kang, Yan, Yan

论文摘要

基于深度学习的视频介绍已经带来了令人鼓舞的结果，并引起了研究人员的越来越多的关注。通常，这些方法通常假定每个框架的损坏区掩模都是已知且易于获得的。但是，这些口罩的注释是劳动密集型且昂贵的，这限制了当前方法的实际应用。因此，我们希望通过定义新的半监督镶嵌设置来放松这一假设，使网络具有仅使用一个框架的带注释的掩码来完成整个视频的损坏区域的能力。具体而言，在这项工作中，我们提出了一个由完成网络和掩码预测网络组成的端到端可训练框架，该框架旨在使用已知掩码生成当前框架的损坏内容，并决定分别填充下一个帧的区域。此外，我们引入了周期一致性损失，以使这两个网络的训练参数正常。这样，完成网络和掩码预测网络可以相互限制，因此可以最大化训练有素的模型的整体性能。此外，由于先验知识的自然存在（例如，损坏的内容和清晰的边框），当前的视频介绍数据集在半监督视频介绍的背景下不适合。因此，我们通过模拟现实情况的损坏视频来创建一个新的数据集。据报道，广泛的实验结果证明了我们在视频介绍任务中模型的优越性。值得注意的是，尽管我们的模型以半监督的方式进行了训练，但它可以作为完全监督的方法实现可比的性能。

Deep learning-based video inpainting has yielded promising results and gained increasing attention from researchers. Generally, these methods usually assume that the corrupted region masks of each frame are known and easily obtained. However, the annotation of these masks are labor-intensive and expensive, which limits the practical application of current methods. Therefore, we expect to relax this assumption by defining a new semi-supervised inpainting setting, making the networks have the ability of completing the corrupted regions of the whole video using the annotated mask of only one frame. Specifically, in this work, we propose an end-to-end trainable framework consisting of completion network and mask prediction network, which are designed to generate corrupted contents of the current frame using the known mask and decide the regions to be filled of the next frame, respectively. Besides, we introduce a cycle consistency loss to regularize the training parameters of these two networks. In this way, the completion network and the mask prediction network can constrain each other, and hence the overall performance of the trained model can be maximized. Furthermore, due to the natural existence of prior knowledge (e.g., corrupted contents and clear borders), current video inpainting datasets are not suitable in the context of semi-supervised video inpainting. Thus, we create a new dataset by simulating the corrupted video of real-world scenarios. Extensive experimental results are reported to demonstrate the superiority of our model in the video inpainting task. Remarkably, although our model is trained in a semi-supervised manner, it can achieve comparable performance as fully-supervised methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题