带有变压器的3D视觉：调查

论文标题

带有变压器的3D视觉：调查

3D Vision with Transformers: A Survey

论文作者

Lahoud, Jean, Cao, Jiale, Khan, Fahad Shahbaz, Cholakkal, Hisham, Anwer, Rao Muhammad, Khan, Salman, Yang, Ming-Hsuan

论文摘要

变压器在自然语言处理中的成功最近引起了计算机视觉领域的关注。由于能够学习长期依赖性，因此已将变压器用作广泛使用的卷积运算符的替代品。事实证明，这种替代者在许多任务中都取得了成功，在许多任务中，几种最先进的方法依靠变压器来更好地学习。在计算机视觉中，3D字段还目睹了使用变压器来增加3D卷积神经网络和多层感知器网络的增加。尽管许多调查都集中在视力中的变压器上，但由于与2D视觉相比，由于数据表示和处理的差异，3D视觉需要特别注意。在这项工作中，我们介绍了针对不同3D视觉任务的100多种变压器方法的系统综述，包括分类，细分，检测，完成，姿势估计等。我们讨论了3D Vision中的变压器设计，这使其可以使用各种3D表示。对于每个应用程序，我们强调了基于变压器的方法的关键属性和贡献。为了评估这些方法的竞争力，我们将它们的性能与12个3D基准的常见非转化方法进行了比较。我们通过讨论3D视觉中变压器的不同开放方向和挑战来结束调查。除了介绍的论文外，我们的目标是频繁更新最新的相关论文及其相应的实现：https：//github.com/lahoud/3d-vision-transformers。

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题