论文标题
内存聚合网络,用于有效的交互式视频对象细分
Memory Aggregation Networks for Efficient Interactive Video Object Segmentation
论文作者
论文摘要
交互式视频对象细分(IVO)旨在在具有用户交互的视频中有效收集目标对象的高质量分割掩码。大多数以前的最先进的方法都通过两个独立的网络来解决IVO,分别用于进行用户互动和时间传播,从而导致推理阶段效率低下。在这项工作中,我们提出了一个统一的框架,称为内存聚合网络(MA-NET),以更有效的方式解决了具有挑战性的IVO。我们的MA-NET将相互作用和传播操作整合到一个网络中,这显着促进了IVO在多轮相互作用方案中的效率。更重要的是,我们提出了一种简单而有效的记忆聚合机制,以记录以前互动回合的信息知识,从而大大提高了发现挑战性的对象的鲁棒性。我们对2018年戴维斯挑战赛验证集的验证集进行了广泛的实验。特别是,我们的MA-NET在没有任何铃铛和哨子的情况下达到了76.1%的J@60分数,表现优于2.7%以上的最先进。
Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions. Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage. In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Net integrates the interaction and the propagation operations into a single network, which significantly promotes the efficiency of iVOS in the scheme of multi-round interactions. More importantly, we propose a simple yet effective memory aggregation mechanism to record the informative knowledge from the previous interaction rounds, improving the robustness in discovering challenging objects of interest greatly. We conduct extensive experiments on the validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Net achieves the J@60 score of 76.1% without any bells and whistles, outperforming the state-of-the-arts with more than 2.7%.