每盘视频对象细分

论文标题

每盘视频对象细分

Per-Clip Video Object Segmentation

论文作者

Park, Kwanyong, Woo, Sanghyun, Oh, Seoung Wug, Kweon, In So, Lee, Joon-Young

论文摘要

最近，基于内存的方法显示了半监督视频对象分割的有希望的结果。这些方法可以在逐框的帮助下预测对象蒙版，并在上一个掩码的经常更新的内存中进行框架。与这种人均推断不同，我们通过将视频对象分割视为夹子掩盖传播来研究替代角度。在此每次CLIP推理方案中，我们使用一个间隔更新内存，并同时处理内存更新之间的一组连续帧（即剪辑）。该方案提供了两个潜在的好处：通过剪辑级优化和效率增益的准确性增益通过并行计算多个帧。为此，我们提出了一种针对人均推理量身定制的新方法。具体而言，我们首先引入夹具操作，以根据剪辑内相关性来完善特征。此外，我们采用了一种渐进的匹配机制来在剪辑中有效地通过信息通行。借助两个模块的协同作用和新提出的每盘基础培训，我们的网络在YouTube-Vos 2018/2019 Val（84.6％和84.6％）和Davis 2016/2017 Val（91.9％和86.1％）上实现了最先进的性能。此外，我们的模型在不同的内存更新间隔内显示了出色的速度准确性权衡，这导致了巨大的灵活性。

Recently, memory-based approaches show promising results on semi-supervised video object segmentation. These methods predict object masks frame-by-frame with the help of frequently updated memory of the previous mask. Different from this per-frame inference, we investigate an alternative perspective by treating video object segmentation as clip-wise mask propagation. In this per-clip inference scheme, we update the memory with an interval and simultaneously process a set of consecutive frames (i.e. clip) between the memory updates. The scheme provides two potential benefits: accuracy gain by clip-level optimization and efficiency gain by parallel computation of multiple frames. To this end, we propose a new method tailored for the per-clip inference. Specifically, we first introduce a clip-wise operation to refine the features based on intra-clip correlation. In addition, we employ a progressive matching mechanism for efficient information-passing within a clip. With the synergy of two modules and a newly proposed per-clip based training, our network achieves state-of-the-art performance on Youtube-VOS 2018/2019 val (84.6% and 84.6%) and DAVIS 2016/2017 val (91.9% and 86.1%). Furthermore, our model shows a great speed-accuracy trade-off with varying memory update intervals, which leads to huge flexibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题