论文标题
VPTR:视频预测的有效变压器
VPTR: Efficient Transformers for Video Prediction
论文作者
论文摘要
在本文中,我们根据有效的局部空间分离注意机制提出了一个新的变压器块,用于视频未来帧预测。基于这个新的变压器块,提出了完全自动回归的视频未来帧预测器。此外,还提出了非自动入学视频预测变压器来提高推理速度并减少其自回归对应物的累积推理误差。为了避免预测未来非常相似的未来框架,应用对比特征损失,以最大程度地提高预测的未来框架和地面真相之间的相互信息。这项工作是第一个对不同场景的两种类型的基于注意力的视频未来帧预测模型进行正式比较的工作。提出的模型与更复杂的最新模型达到了性能竞争力。源代码可在\ emph {https://github.com/xiye20/vptr}中获得。
In this paper, we propose a new Transformer block for video future frames prediction based on an efficient local spatial-temporal separation attention mechanism. Based on this new Transformer block, a fully autoregressive video future frames prediction Transformer is proposed. In addition, a non-autoregressive video prediction Transformer is also proposed to increase the inference speed and reduce the accumulated inference errors of its autoregressive counterpart. In order to avoid the prediction of very similar future frames, a contrastive feature loss is applied to maximize the mutual information between predicted and ground-truth future frame features. This work is the first that makes a formal comparison of the two types of attention-based video future frames prediction models over different scenarios. The proposed models reach a performance competitive with more complex state-of-the-art models. The source code is available at \emph{https://github.com/XiYe20/VPTR}.