TTVO：轻巧的视频对象分割，具有自适应模板注意模块和时间一致性损失

论文标题

TTVO：轻巧的视频对象分割，具有自适应模板注意模块和时间一致性损失

TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss

论文作者

Park, Hyojin, Venkatesh, Ganesh, Kwak, Nojun

论文摘要

半监督视频对象分割（半VOS）在许多应用中广泛使用。此任务是从给定的目标掩码跟踪类别不稳定的对象。为此，根据在线学习，内存网络和光流已经开发了各种方法。这些方法表现出很高的精度，但由于推理时间缓慢和巨大的复杂性，很难在现实世界中使用。为了解决此问题，为快速处理速度设计了模板匹配方法，但在以前的模型中牺牲了很多性能。我们引入了一种基于模板匹配方法和时间一致性损失的新型半VOS模型，以减少重型模型的性能差距，同时加快推理时间的加快。我们的模板匹配方法由短期和长期匹配组成。短期匹配可以增强目标对象定位，而长期匹配可以通过新提出的自适应模板注意模块来改善细节，并处理对象变化。但是，由于过去更新模板时，长期匹配会导致误差引起的错误传播。为了减轻这个问题，我们还提出了时间一致性损失，以通过采用过渡矩阵的概念在相邻帧之间更好的时间连贯性。我们的模型在Davis16基准上以73.8 fps的速度获得79.5％的J＆F得分。该代码可在https://github.com/hyojinpark/ttvos中找到。

Semi-supervised video object segmentation (semi-VOS) is widely used in many applications. This task is tracking class-agnostic objects from a given target mask. For doing this, various approaches have been developed based on online-learning, memory networks, and optical flow. These methods show high accuracy but are hard to be utilized in real-world applications due to slow inference time and tremendous complexity. To resolve this problem, template matching methods are devised for fast processing speed but sacrificing lots of performance in previous models. We introduce a novel semi-VOS model based on a template matching method and a temporal consistency loss to reduce the performance gap from heavy models while expediting inference time a lot. Our template matching method consists of short-term and long-term matching. The short-term matching enhances target object localization, while long-term matching improves fine details and handles object shape-changing through the newly proposed adaptive template attention module. However, the long-term matching causes error-propagation due to the inflow of the past estimated results when updating the template. To mitigate this problem, we also propose a temporal consistency loss for better temporal coherence between neighboring frames by adopting the concept of a transition matrix. Our model obtains 79.5% J&F score at the speed of 73.8 FPS on the DAVIS16 benchmark. The code is available in https://github.com/HYOJINPARK/TTVOS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题