LVO：长期视频对象细分的基准

论文标题

LVO：长期视频对象细分的基准

LVOS: A Benchmark for Long-term Video Object Segmentation

论文作者

Hong, Lingyi, Chen, Wenchao, Liu, Zhongying, Zhang, Wei, Guo, Pinxue, Chen, Zhaoyu, Zhang, Wenqiang

论文摘要

现有的视频对象细分（VOS）基准将重点放在短期视频上，该视频持续约3-5秒，并且大多数时候都可以看到对象。这些视频对实际应用的代表性很差，并且在现实情况下，没有长期数据集限制了对该应用程序的进一步研究。因此，在本文中，我们提供了一个名为\ textbf {lvos}的新基准数据集，该数据集由220个视频组成，总持续时间为421分钟。据我们所知，LVO是第一个密集注释的长期VOS数据集。我们的LVO中的视频平均为1.59分钟，比现有VOS数据集中的视频长20倍。每个视频都包含各种属性，尤其是源自野生的挑战，例如长期重新出现和相似的objeccts。基于LVO，我们评估了现有的视频对象分割算法，并提出了一个多样化的动态记忆网络（DDMEMORY），它们由三个互补的存储器组成，以利用三个互补的记忆库充分利用时间信息。实验结果证明了先前方法的优势和弱点，指出了有希望的进一步研究方向。数据和代码可在https://lingyihongfd.github.io/lvos.github.io/上找到。

Existing video object segmentation (VOS) benchmarks focus on short-term videos which just last about 3-5 seconds and where objects are visible most of the time. These videos are poorly representative of practical applications, and the absence of long-term datasets restricts further investigation of VOS on the application in realistic scenarios. So, in this paper, we present a new benchmark dataset named \textbf{LVOS}, which consists of 220 videos with a total duration of 421 minutes. To the best of our knowledge, LVOS is the first densely annotated long-term VOS dataset. The videos in our LVOS last 1.59 minutes on average, which is 20 times longer than videos in existing VOS datasets. Each video includes various attributes, especially challenges deriving from the wild, such as long-term reappearing and cross-temporal similar objeccts.Based on LVOS, we assess existing video object segmentation algorithms and propose a Diverse Dynamic Memory network (DDMemory) that consists of three complementary memory banks to exploit temporal information adequately. The experimental results demonstrate the strength and weaknesses of prior methods, pointing promising directions for further study. Data and code are available at https://lingyihongfd.github.io/lvos.github.io/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题