CrossDTR：3D对象检测的跨视图和深度引导的变压器

论文标题

CrossDTR：3D对象检测的跨视图和深度引导的变压器

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

论文作者

Tseng, Ching-Yu, Chen, Yi-Rong, Lee, Hsin-Ying, Wu, Tsung-Han, Chen, Wen-Chin, Hsu, Winston H.

论文摘要

为了以低成本的自动驾驶成本实现准确的3D对象检测，已经提出了许多多摄像头方法并解决了单眼方法的闭塞问题。但是，由于缺乏准确的估计深度，现有的多摄像机方法通常会沿着深度方向产生多个边界框，例如行人等困难的小物体，从而产生极低的回忆。此外，将深度预测模块直接应用于通常由大型网络架构组成的现有多摄像机方法，无法满足自动驾驶应用程序的实时要求。为了解决这些问题，我们提出了3D对象检测的跨视图和深度引导的变压器，CrossDTR。首先，我们的轻质深度预测器旨在生成精确的对象稀疏深度图和低维深度嵌入，而在监督过程中，无需额外的深度数据集。其次，开发了一个跨视图引导的变压器，以融合深度嵌入以及来自不同视图的相机的图像特征并生成3D边界框。广泛的实验表明，我们的方法在行人检测中大大超过了10％，总体图和NDS指标的大约3％。同样，计算分析表明，我们的方法比以前的方法快5倍。我们的代码将在https://github.com/sty61010/crossdtr上公开提供。

To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight depth predictor is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra depth datasets during supervision. Second, a cross-view depth-guided transformer is developed to fuse the depth embeddings as well as image features from cameras of different views and generate 3D bounding boxes. Extensive experiments demonstrated that our method hugely surpassed existing multi-camera methods by 10 percent in pedestrian detection and about 3 percent in overall mAP and NDS metrics. Also, computational analyses showed that our method is 5 times faster than prior approaches. Our codes will be made publicly available at https://github.com/sty61010/CrossDTR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题