视频介绍的流引导变压器

论文标题

视频介绍的流引导变压器

Flow-Guided Transformer for Video Inpainting

论文作者

Zhang, Kaidong, Fu, Jingjing, Liu, Dong

论文摘要

我们提出了一种流动引导的变压器，该变压器创新地利用光学流量暴露的运动差异来指导变压器中的注意力检索，以进行高保真视频介绍。更特别地，我们设计了一个新颖的流程完成网络，以通过利用当地时间窗口中的相关流量功能来完成损坏的流。有了完整的流，我们将内容传播到视频框架上，并采用流引导的变压器来综合其余损坏的区域。我们将变压器沿时间和空间尺寸解开，因此我们可以轻松地集成本地相关的完整流量以指导空间注意力。此外，我们设计了一个流螺旋模块，以精确控制完成的流动对每个空间变压器的影响。为了效率，我们将窗口分区策略引入空间和颞变压器。尤其是在空间变压器中，我们设计了一个双重透视空间MHSA，将全局令牌集成到基于窗口的关注。广泛的实验证明了该方法在定性和定量上的有效性。代码可在https://github.com/hitachinsk/fgt上找到。

We propose a flow-guided transformer, which innovatively leverage the motion discrepancy exposed by optical flows to instruct the attention retrieval in transformer for high fidelity video inpainting. More specially, we design a novel flow completion network to complete the corrupted flows by exploiting the relevant flow features in a local temporal window. With the completed flows, we propagate the content across video frames, and adopt the flow-guided transformer to synthesize the rest corrupted regions. We decouple transformers along temporal and spatial dimension, so that we can easily integrate the locally relevant completed flows to instruct spatial attention only. Furthermore, we design a flow-reweight module to precisely control the impact of completed flows on each spatial transformer. For the sake of efficiency, we introduce window partition strategy to both spatial and temporal transformers. Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention. Extensive experiments demonstrate the effectiveness of the proposed method qualitatively and quantitatively. Codes are available at https://github.com/hitachinsk/FGT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题