在线视频实例细分的两级时间关系模型

论文标题

在线视频实例细分的两级时间关系模型

Two-Level Temporal Relation Model for Online Video Instance Segmentation

论文作者

Çoban, Çağan Selim, Keskin, Oğuzhan, Pont-Tuset, Jordi, Güney, Fatma

论文摘要

在视频实例细分（VIS）中，当前方法要么通过将整个视频作为输入和离线处理来关注结果的质量；或以速度，以竞争性能为代价来逐帧处理框架。在这项工作中，我们提出了一种在线方法，该方法与离线同行的性能相当。我们介绍了一个消息通讯的图形神经网络，该神经网络编码对象并通过时间关联。我们还提出了一个新型模块，以将特征金字塔网络与残留连接的特征特征融合。我们的模型端到端训练有素，在在线方法中的YouTube-VIS数据集上实现了最先进的性能。戴维斯（Davis）的进一步实验证明了我们模型对视频对象分割任务的概括能力。代码可在：\ url {https://github.com/caganselim/tltm}中获得。

In Video Instance Segmentation (VIS), current approaches either focus on the quality of the results, by taking the whole video as input and processing it offline; or on speed, by handling it frame by frame at the cost of competitive performance. In this work, we propose an online method that is on par with the performance of the offline counterparts. We introduce a message-passing graph neural network that encodes objects and relates them through time. We additionally propose a novel module to fuse features from the feature pyramid network with residual connections. Our model, trained end-to-end, achieves state-of-the-art performance on the YouTube-VIS dataset within the online methods. Further experiments on DAVIS demonstrate the generalization capability of our model to the video object segmentation task. Code is available at: \url{https://github.com/caganselim/TLTM}

下载PDF全文

下载文献需遵守相关版权规定

论文标题