在点云上进行自我监督学习的掩盖歧视

论文标题

在点云上进行自我监督学习的掩盖歧视

Masked Discrimination for Self-Supervised Learning on Point Clouds

论文作者

Liu, Haotian, Cai, Mu, Lee, Yong Jae

论文摘要

在图像和语言领域中，蒙面自动编码在自我监督的学习方面取得了巨大的成功。但是，基于面具的预仔尚未显示出对点云理解的好处，这可能是由于PointNet（PointNet）无法正确处理训练的标准骨干，而不是测试在训练过程中引入的测试分配不匹配。在本文中，我们通过提出一个歧视性掩蔽式变压器框架，maskPoint}来弥合这一差距。我们的关键想法是将点云表示为离散的占用值（1如果点云的一部分；如果不是的，则为0），并在蒙版对象点和采样噪声点之间执行简单的二进制分类作为代理任务。这样，我们的方法是对点云中的点采样差异的强大，并促进了学习丰富的表示。我们在几个下游任务中评估了验证的模型，包括3D形状分类，分割和现实词对象检测，并证明了最新的结果，同时与先前的原始变压器基线相比，在实现了明显的预处理速度（例如，扫描仪上的4.1倍）。代码可在https://github.com/haotian-liu/maskpoint上找到。

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains. However, mask based pretraining has yet to show benefits for point cloud understanding, likely due to standard backbones like PointNet being unable to properly handle the training versus testing distribution mismatch introduced by masking during training. In this paper, we bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint}, for point clouds. Our key idea is to represent the point cloud as discrete occupancy values (1 if part of the point cloud; 0 if not), and perform simple binary classification between masked object points and sampled noise points as the proxy task. In this way, our approach is robust to the point sampling variance in point clouds, and facilitates learning rich representations. We evaluate our pretrained models across several downstream tasks, including 3D shape classification, segmentation, and real-word object detection, and demonstrate state-of-the-art results while achieving a significant pretraining speedup (e.g., 4.1x on ScanNet) compared to the prior state-of-the-art Transformer baseline. Code is available at https://github.com/haotian-liu/MaskPoint.

下载PDF全文

下载文献需遵守相关版权规定

论文标题