点变压器V2：分组向量注意和基于分区的池

论文标题

点变压器V2：分组向量注意和基于分区的池

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

论文作者

Wu, Xiaoyang, Lao, Yixing, Jiang, Li, Liu, Xihui, Zhao, Hengshuang

论文摘要

作为一项开创性的探索变压器体系结构的开创性工作，Point Transformer在多个竞争激烈的基准上取得了令人印象深刻的结果。在这项工作中，我们分析了点变压器的局限性，并使用新型设计提出了强大而有效的点变压器V2模型，从而克服了先前工作的局限性。特别是，我们首先提出了组向量的关注，这比以前版本的向量注意更有效。通过一个新型的分组重量编码层，我们介绍了可学习的重量编码和多头注意的可学习权重编码和多头注意的优势。我们还通过编码乘数的附加位置来加强关注位置信息。此外，我们设计了基于新颖和轻量级分区的合并方法，从而可以更好地进行空间对准和更有效的采样。广泛的实验表明，我们的模型比其前身的性能更好，并在几个具有挑战性的3D点云理解基准上实现了最新的基准，包括Scannet V2和S3DIS上的3D点云进行分割以及模型NetNet40上的3D点云分类。我们的代码将在https://github.com/gofinge/pointtransformerv2上找到。

As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which enable better spatial alignment and more efficient sampling. Extensive experiments show that our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and 3D point cloud classification on ModelNet40. Our code will be available at https://github.com/Gofinge/PointTransformerV2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题