3D对象检测的体素场融合

论文标题

3D对象检测的体素场融合

Voxel Field Fusion for 3D Object Detection

论文作者

Li, Yanwei, Qi, Xiaojuan, Chen, Yukang, Wang, Liwei, Li, Zeming, Sun, Jian, Jia, Jiaya

论文摘要

在这项工作中，我们提出了一个概念上简单而有效的跨模式3D对象检测框架，称为Voxel Field Fusion。所提出的方法旨在通过表示和融合增强图像特征作为Voxel场中的射线来保持交叉模式的一致性。为此，可学习的采样器首先设计为从图像平面中采样至关重要的特征，该特征以射线的方式投影到体素网格上，从而保持了具有空间上下文的特征表示的一致性。此外，在构造体素场中的补充环境融合了射线融合，以融合特征。我们进一步开发了混合增强器以使特征变化的转换保持一致，从而弥合了数据增强中的模态差距。该提出的框架被证明是在各种基准中获得一致的收益，并且在Kitti和Nuscenes数据集上的先前基于Fusion的方法优于先前的基于融合的方法。代码可在https://github.com/dvlab-research/vff上提供。

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field. To this end, the learnable sampler is first designed to sample vital features from the image plane that are projected to the voxel grid in a point-to-ray manner, which maintains the consistency in feature representation with spatial context. In addition, ray-wise fusion is conducted to fuse features with the supplemental context in the constructed voxel field. We further develop mixed augmentor to align feature-variant transformations, which bridges the modality gap in data augmentation. The proposed framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets. Code is made available at https://github.com/dvlab-research/VFF.

下载PDF全文

下载文献需遵守相关版权规定

论文标题