DISP R-CNN：通过形状的立体3D对象检测先验引导实例差异估计

论文标题

DISP R-CNN：通过形状的立体3D对象检测先验引导实例差异估计

Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

论文作者

Sun, Jiaming, Chen, Linghao, Xie, Yiming, Zhang, Siyu, Jiang, Qinhong, Zhou, Xiaowei, Bao, Hujun

论文摘要

在本文中，我们提出了一个名为DISP R-CNN的新型系统，用于从立体声图像中检测3D对象。许多最近的工作通过首先恢复具有差异估计的点云，然后应用3D检测器来解决此问题。为整个图像计算了差异图，该图的成本很高，无法利用特定于类别的先验。相比之下，我们设计了一个实例差异估计网络（IDISPNET），该实例仅预测感兴趣的对象上的像素，并在更准确的差异估计中学习特定于类别的形状。为了解决培训中缺乏差异注释的挑战，我们建议使用统计形状模型来产生密集的差异伪景真相，而无需激光点云，这使我们的系统更加广泛地适用。 KITTI数据集的实验表明，即使在训练时间无法使用LiDar地面真相，DESP R-CNN也能够达到竞争性能，并且在平均精度方面，以前的最先前方法也降低了20％。

In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering a point cloud with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题