基于运动指导的高效无监督视频对象分割网络

论文标题

基于运动指导的高效无监督视频对象分割网络

Efficient Unsupervised Video Object Segmentation Network Based on Motion Guidance

论文作者

Hu, Chao, Zhu, Liqiang

论文摘要

由于无监督视频对象检测的性能限制问题，其大规模应用程序受到限制。为了响应这个疼痛点，我们提出了另一种解决这个问题点的出色方法。通过将运动表征纳入无监督的视频对象检测中，可以提高检测精度，同时减少网络的计算量。整个网络结构由双流网络，运动引导模块和多尺度渐进式融合模块组成。检测目标的外观和运动表示是通过双流网络获得的。然后，运动表示的语义特征是通过运动引导模块中的局部注意机制获得的，以获得外观表示的高级语义特征。然后，多尺度的渐进式融合模块将进一步融合了双流网络中不同深层语义特征的特征，以提高整个网络的检测效果。我们在戴维斯16，FBM和Visal的三个数据集上进行了许多实验。验证结果表明，所提出的方法可实现出色的准确性和性能，并证明了算法的优越性和鲁棒性。

Due to the problem of performance constraints of unsupervised video object detection, its large-scale application is limited. In response to this pain point, we propose another excellent method to solve this problematic point. By incorporating motion characterization in unsupervised video object detection, detection accuracy is improved while reducing the computational amount of the network. The whole network structure consists of dual-stream network, motion guidance module, and multi-scale progressive fusion module. The appearance and motion representations of the detection target are obtained through a dual-stream network. Then, the semantic features of the motion representation are obtained through the local attention mechanism in the motion guidance module to obtain the high-level semantic features of the appearance representation. The multi-scale progressive fusion module then fuses the features of different deep semantic features in the dual-stream network further to improve the detection effect of the overall network. We have conducted numerous experiments on the three datasets of DAVIS 16, FBMS, and ViSal. The verification results show that the proposed method achieves superior accuracy and performance and proves the superiority and robustness of the algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题