跨点：3D点云理解的自我监管的跨模式对比度学习

论文标题

跨点：3D点云理解的自我监管的跨模式对比度学习

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

论文作者

Afham, Mohamed, Dissanayake, Isuru, Dissanayake, Dinithi, Dharmasiri, Amaya, Thilakarathna, Kanchana, Rodrigo, Ranga

论文摘要

用于不同任务的大规模点云数据集的手动注释，例如3D对象分类，分割和检测，通常由于点云的不规则结构而费力。在没有任何人类标签的情况下运作的自学学习是解决这个问题的一种有希望的方法。我们在现实世界中观察到，人类能够绘制从2D图像中学到的视觉概念以了解3D世界。在这种见解的鼓励下，我们提出了Crosspoint，这是一种简单的交叉模式对比学习方法，以学习可转移的3D点云表示。它通过在不变空间中最大化点云与相应的渲染2D图像之间达成共识，从而启用对象的3D-2D对应关系，同时鼓励对点云模态中的变换不变性。我们的联合训练目标结合了跨模式内和跨模式内的特征对应关系，因此以自我监督的方式从3D点云和2D图像模式中汇总了丰富的学习信号。实验结果表明，我们的方法在包括3D对象分类和分割的各种下游任务上优于先前的无监督学习方法。此外，消融研究验证了我们方法的效力，以更好地理解云。代码和预算模型可在http://github.com/mohamedafham/crosspoint上找到。

Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object classification, segmentation and detection is often laborious owing to the irregular structure of point clouds. Self-supervised learning, which operates without any human labeling, is a promising approach to address this issue. We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. It enables a 3D-2D correspondence of objects by maximizing agreement between point clouds and the corresponding rendered 2D image in the invariant space, while encouraging invariance to transformations in the point cloud modality. Our joint training objective combines the feature correspondences within and across modalities, thus ensembles a rich learning signal from both 3D point cloud and 2D image modalities in a self-supervised fashion. Experimental results show that our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation. Further, the ablation studies validate the potency of our approach for a better point cloud understanding. Code and pretrained models are available at http://github.com/MohamedAfham/CrossPoint.

下载PDF全文

下载文献需遵守相关版权规定

论文标题