POS-BERT：点云一阶段BERT预训练

论文标题

POS-BERT：点云一阶段BERT预训练

POS-BERT: Point Cloud One-Stage BERT Pre-Training

论文作者

Fu, Kexue, Gao, Peng, Liu, ShaoLei, Zhang, Renrui, Qiao, Yu, Wang, Manning

论文摘要

最近，在NLP，图像和点云（例如BERT）中，结合变压器和蒙面语言建模的预训练范式已取得了巨大的成功。但是，将BERT从NLP直接扩展到点云需要在预训练之前训练固定的离散变异自动编码器（DVAE），这导致了一种称为Point-Bert的复杂的两阶段方法。受Bert和Moco的启发，我们提出了Pos-Bert，这是一种单阶段的BERT预训练方法，用于点云。具体来说，我们使用蒙版补丁建模（MPM）任务来执行点云预训练，该训练旨在在相应的令牌输出的监督下恢复蒙版的补丁信息。与Point-Bert不同，它的令牌剂是训练有素且冻结的。我们建议将动态更新的动量编码器用作令牌，该代币已更新并输出动态监督信号以及训练过程。此外，为了学习高级语义表示，我们结合了对比度学习，以最大程度地提高不同变换点云之间的类令牌一致性。广泛的实验表明，Pos-Bert可以提取高质量的预训练特征，并促进下游任务以提高性能。 POS-BERT使用ModelNet40上提取特征并在ModelNet40上提取线性SVM的预训练模型，可实现最新的分类精度，该准确性超过了Point-Bert 3.5 \％。此外，我们的方法已大大改善了许多下游任务，例如微调分类，几乎没有射击分类，零件分割。代码和训练有素的模型将在：\ url {https://github.com/fukexue/pos-bert}上。

Recently, the pre-training paradigm combining Transformer and masked language modeling has achieved tremendous success in NLP, images, and point clouds, such as BERT. However, directly extending BERT from NLP to point clouds requires training a fixed discrete Variational AutoEncoder (dVAE) before pre-training, which results in a complex two-stage method called Point-BERT. Inspired by BERT and MoCo, we propose POS-BERT, a one-stage BERT pre-training method for point clouds. Specifically, we use the mask patch modeling (MPM) task to perform point cloud pre-training, which aims to recover masked patches information under the supervision of the corresponding tokenizer output. Unlike Point-BERT, its tokenizer is extra-trained and frozen. We propose to use the dynamically updated momentum encoder as the tokenizer, which is updated and outputs the dynamic supervision signal along with the training process. Further, in order to learn high-level semantic representation, we combine contrastive learning to maximize the class token consistency between different transformation point clouds. Extensive experiments have demonstrated that POS-BERT can extract high-quality pre-training features and promote downstream tasks to improve performance. Using the pre-training model without any fine-tuning to extract features and train linear SVM on ModelNet40, POS-BERT achieves the state-of-the-art classification accuracy, which exceeds Point-BERT by 3.5\%. In addition, our approach has significantly improved many downstream tasks, such as fine-tuned classification, few-shot classification, part segmentation. The code and trained-models will be available at: \url{https://github.com/fukexue/POS-BERT}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题