深度估计最重要的是：改善单程3D检测和跟踪的人体对象深度估计

论文标题

深度估计最重要的是：改善单程3D检测和跟踪的人体对象深度估计

Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking

论文作者

Jing, Longlong, Yu, Ruichi, Kretzschmar, Henrik, Li, Kang, Qi, Charles R., Zhao, Hang, Ayvaci, Alper, Chen, Xu, Cower, Dillon, Li, Yingwei, You, Yurong, Deng, Han, Li, Congcong, Anguelov, Dragomir

论文摘要

由于其在自动驾驶中的应用，近年来，基于单程图像的3D感知已成为一个活跃的研究领域。与基于激光雷达的技术相比，单眼3D感知（包括检测和跟踪）的方法通常会产生较低的性能。通过系统分析，我们确定了每个对象深度估计精度是界限性能的主要因素。在这种观察过程中，我们提出了一种多级融合方法，该方法结合了不同的代表（RGB和Pseudo-lidar）和跨多个对象（Tracklets）的时间信息，以增强人均深度估计。我们提出的Fusion方法实现了Waymo打开数据集，Kitti检测数据集和Kitti MOT数据集对每个对象深度估计的最新性能。我们进一步证明，通过简单地用融合增强的深度替换估计的深度，我们可以在单眼3D感知任务（包括检测和跟踪）方面取得重大改进。

Monocular image-based 3D perception has become an active research area in recent years owing to its applications in autonomous driving. Approaches to monocular 3D perception including detection and tracking, however, often yield inferior performance when compared to LiDAR-based techniques. Through systematic analysis, we identified that per-object depth estimation accuracy is a major factor bounding the performance. Motivated by this observation, we propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation. Our proposed fusion method achieves the state-of-the-art performance of per-object depth estimation on the Waymo Open Dataset, the KITTI detection dataset, and the KITTI MOT dataset. We further demonstrate that by simply replacing estimated depth with fusion-enhanced depth, we can achieve significant improvements in monocular 3D perception tasks, including detection and tracking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题