分解：对于关节2D和3D半监督对象检测，两位老师比一个教师要好

论文标题

分解：对于关节2D和3D半监督对象检测，两位老师比一个教师要好

DetMatch: Two Teachers are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection

论文作者

Park, Jinhyung, Xu, Chenfeng, Zhou, Yiyang, Tomizuka, Masayoshi, Zhan, Wei

论文摘要

尽管许多3D检测工作利用了RGB图像与点云之间的互补关系，但在更广泛的半监视对象识别框架中的发展仍未受到多模式融合的影响。尽管配对图像和点云框架可用，但目前的方法开发了2D和3D半监督学习的独立管道。观察到每个传感器的不同特征使它们偏向于检测不同的物体，我们提出了分解，这是对2D和3D模式的联合半监督学习的灵活框架。通过识别两个传感器中检测到的对象，我们的管道生成了更清洁，更健壮的伪标签集，既表现出更强的性能，又显示出单模性误差传播。此外，我们利用RGB图像的更丰富的语义来纠正不正确的3D类预测并改善3D框的定位。在评估具有挑战性的Kitti和Waymo数据集中，我们对强大的半监督学习方法进行了改进，并观察到更高质量的伪标签。代码将在https://github.com/divadi/detmatch上发布

While numerous 3D detection works leverage the complementary relationship between RGB images and point clouds, developments in the broader framework of semi-supervised object recognition remain uninfluenced by multi-modal fusion. Current methods develop independent pipelines for 2D and 3D semi-supervised learning despite the availability of paired image and point cloud frames. Observing that the distinct characteristics of each sensor cause them to be biased towards detecting different objects, we propose DetMatch, a flexible framework for joint semi-supervised learning on 2D and 3D modalities. By identifying objects detected in both sensors, our pipeline generates a cleaner, more robust set of pseudo-labels that both demonstrates stronger performance and stymies single-modality error propagation. Further, we leverage the richer semantics of RGB images to rectify incorrect 3D class predictions and improve localization of 3D boxes. Evaluating on the challenging KITTI and Waymo datasets, we improve upon strong semi-supervised learning methods and observe higher quality pseudo-labels. Code will be released at https://github.com/Divadi/DetMatch

下载PDF全文

下载文献需遵守相关版权规定

论文标题