论文标题
视频中的运动感应自我监督的对象发现
Motion-inductive Self-supervised Object Discovery in Videos
论文作者
论文摘要
在本文中,我们考虑了视频中无监督对象发现的任务。以前的作品通过处理光流到细分对象显示出令人鼓舞的结果。但是,作为输入带来的流动带来了两个缺点。首先,当对象保持静态或部分遮挡时,流无法捕获足够的提示。其次,由于缺少的纹理信息,从纯纯输入中建立时间连贯性是一项挑战。为了应对这些限制,我们提出了一个模型,用于直接处理连续的RGB帧,并使用分层表示,推断任何对帧之间的光流,并将不透明通道视为分割。此外,为了实施对象持久性,我们将时间一致性损失应用于从随机分配的框架中推断的掩码上,这些框架是指不同步骤的动作,并鼓励模型即使对象可能不会在当前时间点移动。在实验上,我们在三个公共视频分割数据集(Davis2016,Segtrackv2和FBMS-59)上表现出优于以前最先进的方法,同时通过避免计算光流的开销作为输入来有效地计算。
In this paper, we consider the task of unsupervised object discovery in videos. Previous works have shown promising results via processing optical flows to segment objects. However, taking flow as input brings about two drawbacks. First, flow cannot capture sufficient cues when objects remain static or partially occluded. Second, it is challenging to establish temporal coherency from flow-only input, due to the missing texture information. To tackle these limitations, we propose a model for directly processing consecutive RGB frames, and infer the optical flow between any pair of frames using a layered representation, with the opacity channels being treated as the segmentation. Additionally, to enforce object permanence, we apply temporal consistency loss on the inferred masks from randomly-paired frames, which refer to the motions at different paces, and encourage the model to segment the objects even if they may not move at the current time point. Experimentally, we demonstrate superior performance over previous state-of-the-art methods on three public video segmentation datasets (DAVIS2016, SegTrackv2, and FBMS-59), while being computationally efficient by avoiding the overhead of computing optical flow as input.