论文标题

DS-NET:视频显着对象检测的动态时空网络

DS-Net: Dynamic Spatiotemporal Network for Video Salient Object Detection

论文作者

Liu, Jing, Wang, Jiaxiang, Wang, Weikang, Su, Yuting

论文摘要

随着移动物体总是引起人眼的更多关注,始终将时间动机信息与空间信息相辅相成,以检测视频中的显着对象。尽管已提出有效的工具(例如光流)来提取时间动机信息,但由于相机的运动或显着物体的部分运动,它通常会遇到难以检测的困难。在本文中,我们研究了空间和时间信息的免费作用,并提出了一种新型的动态时空网络(DS-NET),以更有效地融合时空信息。我们构建一个对称的两种型网络,以明确提取空间和时间特征。动态重量发生器(DWG)旨在自动学习相应的显着性分支的可靠性。并且设计了自上而下的横向聚合(CAA)程序,以促进时空特征的动态互补聚合。最后,通过空间注意力通过粗糙显着性图的指导来修改这些特征,然后通过解码器部分以获取最终显着图。对五个基准VOS,Davis,FBMS,Segtrack-V2和Visal的实验结果表明,所提出的方法比最新的算法取得了优越的性能。源代码可在https://github.com/tjummg/ds-net上找到。

As moving objects always draw more attention of human eyes, the temporal motive information is always exploited complementarily with spatial information to detect salient objects in videos. Although efficient tools such as optical flow have been proposed to extract temporal motive information, it often encounters difficulties when used for saliency detection due to the movement of camera or the partial movement of salient objects. In this paper, we investigate the complimentary roles of spatial and temporal information and propose a novel dynamic spatiotemporal network (DS-Net) for more effective fusion of spatiotemporal information. We construct a symmetric two-bypass network to explicitly extract spatial and temporal features. A dynamic weight generator (DWG) is designed to automatically learn the reliability of corresponding saliency branch. And a top-down cross attentive aggregation (CAA) procedure is designed so as to facilitate dynamic complementary aggregation of spatiotemporal features. Finally, the features are modified by spatial attention with the guidance of coarse saliency map and then go through decoder part for final saliency map. Experimental results on five benchmarks VOS, DAVIS, FBMS, SegTrack-v2, and ViSal demonstrate that the proposed method achieves superior performance than state-of-the-art algorithms. The source code is available at https://github.com/TJUMMG/DS-Net.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源