3D双融合：3D对象检测的双域双Query摄像头融合

论文标题

3D双融合：3D对象检测的双域双Query摄像头融合

3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection

论文作者

Kim, Yecheol, Park, Konyul, Kim, Minwook, Kum, Dongsuk, Choi, Jun Won

论文摘要

从相机和激光雷达传感器中融合数据是实现强大3D对象检测的必不可少的技术。相机融合中的一个关键挑战是减轻两个传感器之间在坐标和数据分布融合时之间的较大域间隙。在本文中，我们提出了一种称为3D双融合的新型摄像头融合体系结构，该体系结构旨在减轻摄像机和LIDAR数据的功能表示之间的差距。所提出的方法融合了摄像头视图和3D素素视图域的功能，并通过可变形的注意力对其相互作用进行建模。我们重新设计了变压器融合编码器，以汇总两个域的信息。两个重大变化包括1）基于双查询的可变形注意力融合了双域特征，并在双重问题解码之前进行了交互式和2）3D局部自我注意。实验评估的结果表明，所提出的摄像机融合体系结构在Kitti和Nuscenes数据集上实现了竞争性能，并在某些3D对象检测基准类别中具有最先进的性能。

Fusing data from cameras and LiDAR sensors is an essential technique to achieve robust 3D object detection. One key challenge in camera-LiDAR fusion involves mitigating the large domain gap between the two sensors in terms of coordinates and data distribution when fusing their features. In this paper, we propose a novel camera-LiDAR fusion architecture called, 3D Dual-Fusion, which is designed to mitigate the gap between the feature representations of camera and LiDAR data. The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention. We redesign the transformer fusion encoder to aggregate the information from the two domains. Two major changes include 1) dual query-based deformable attention to fuse the dual-domain features interactively and 2) 3D local self-attention to encode the voxel-domain queries prior to dual-query decoding. The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets, with state-of-the-art performances in some 3D object detection benchmarks categories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题