论文标题
DFTR:用于显着对象检测的深度监督融合变压器
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
论文作者
论文摘要
自动化的显着对象检测(SOD)在许多计算机视觉应用中起着越来越重要的作用。通过将深度信息重新提出为监督而不是输入,深度监督的卷积神经网络(CNN)在RGB和RGB-D SOD场景上都取得了令人鼓舞的结果,对于推进阶段中额外的DEPTH网络和深度投入的要求。本文首次旨在将深度监督的适用性扩展到变压器体系结构。具体而言,我们开发了深度监督的融合变压器(DFTR),以进一步提高RGB和RGB-D SOD的准确性。提出的DFTR涉及三个主要特征:1)DFTR,据我们所知,DFTR是第一个用于深度监督SOD的纯变压器模型; 2)提出了一个多尺度特征聚合(MFA)模块,以完全利用由Swin Transformer用粗到精细的方式编码的多尺度特征; 3)为了使双向信息跨越不同的特征流,新型的多阶段特征融合(MFF)模块进一步集成到我们的DFTR中,重点是在不同的网络学习阶段的显着区域。我们在十个基准数据集上广泛评估了提出的DFTR。实验结果表明,我们的DFTR始终优于RGB和RGB-D SOD任务的现有最新方法。代码和模型将公开可用。
Automated salient object detection (SOD) plays an increasingly crucial role in many computer vision applications. By reformulating the depth information as supervision rather than as input, depth-supervised convolutional neural networks (CNN) have achieved promising results on both RGB and RGB-D SOD scenarios with the merits of no requirements for extra depth networks and depth inputs in the inference stage. This paper, for the first time, seeks to expand the applicability of depth supervision to the Transformer architecture. Specifically, we develop a Depth-supervised Fusion TRansformer (DFTR), to further improve the accuracy of both RGB and RGB-D SOD. The proposed DFTR involves three primary features: 1) DFTR, to the best of our knowledge, is the first pure Transformer-based model for depth-supervised SOD; 2) A multi-scale feature aggregation (MFA) module is proposed to fully exploit the multi-scale features encoded by the Swin Transformer in a coarse-to-fine manner; 3) To enable bidirectional information flow across different streams of features, a novel multi-stage feature fusion (MFF) module is further integrated into our DFTR with the emphasis on salient regions at different network learning stages. We extensively evaluate the proposed DFTR on ten benchmarking datasets. Experimental results show that our DFTR consistently outperforms the existing state-of-the-art methods for both RGB and RGB-D SOD tasks. The code and model will be made publicly available.