自我监督的单眼深度估计中的变压器，具有未知的摄像机内在的

论文标题

自我监督的单眼深度估计中的变压器，具有未知的摄像机内在的

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

论文作者

Varma, Arnav, Chawla, Hemang, Zonooz, Bahram, Arani, Elahe

论文摘要

自动驾驶和高级驾驶员援助系统的出现需要计算机视觉中的持续发展，以了解3D场景。自我监督的单眼深度估计，这是一种不使用地面真相标签的单个相机对象估算对象的方法，是3D场景理解中的重要任务。但是，此任务的现有方法仅限于卷积神经网络（CNN）体系结构。与使用局部线性操作并在整个层中丢失特征分辨率的CNN相反，视觉变压器在每个阶段都在恒定分辨率的情况下进行过程。尽管最近的作品已将变形金刚与CNN的诸如图像分类等任务进行了比较，但尚无研究来研究使用变压器进行自我监督的单眼深度估计的影响。在这里，我们首先演示如何适应视觉变压器以进行自我监督的单眼深度估计。此后，我们将基于变压器和CNN的体系结构在Kitti深度预测基准上的性能以及对自然腐败和对抗性攻击的稳健性，包括摄像机的内在启动时的稳健性。我们的研究表明，尽管运行时效率较低，但基于变压器的体系结构如何实现可比的性能，同时更加稳健和概括。

The advent of autonomous driving and advanced driver assistance systems necessitates continuous developments in computer vision for 3D scene understanding. Self-supervised monocular depth estimation, a method for pixel-wise distance estimation of objects from a single camera without the use of ground truth labels, is an important task in 3D scene understanding. However, existing methods for this task are limited to convolutional neural network (CNN) architectures. In contrast with CNNs that use localized linear operations and lose feature resolution across the layers, vision transformers process at constant resolution with a global receptive field at every stage. While recent works have compared transformers against their CNN counterparts for tasks such as image classification, no study exists that investigates the impact of using transformers for self-supervised monocular depth estimation. Here, we first demonstrate how to adapt vision transformers for self-supervised monocular depth estimation. Thereafter, we compare the transformer and CNN-based architectures for their performance on KITTI depth prediction benchmarks, as well as their robustness to natural corruptions and adversarial attacks, including when the camera intrinsics are unknown. Our study demonstrates how transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust and generalizable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题