Glpanodepth：全球到本地全景深度估计

论文标题

Glpanodepth：全球到本地全景深度估计

GLPanoDepth: Global-to-Local Panoramic Depth Estimation

论文作者

Bai, Jiayang, Lai, Shuichang, Qin, Haoyu, Guo, Jie, Guo, Yanwen

论文摘要

在本文中，我们提出了一种基于学习的方法，用于从单眼全向图像预测场景的致密深度值。全向图像具有完整的视野，比透视图像提供了更完整的场景描述。但是，大多数当前解决方案依赖的完全横线网络未能从全景中捕获丰富的全球环境。为了解决这个问题，以及全景图中等位的投影的变形，我们提出了Cubemap Vision Transformers（CVIT），这是一种新的基于变压器的架构，可以对远程依赖性建模并从全景中提取无变形的全局特征。我们表明，Cubemap视觉变压器在每个阶段都有一个全球接收场，并且可以为球形信号提供全球连贯的预测。为了保留重要的本地特征，我们进一步设计了管道中的基于卷积的分支（称为Glpanodepth），并在多个尺度上从Cubemap Vision Transformers融合了全局特征。这种全球到本地策略使我们能够在全景中充分利用有用的全球和本地特征，从而在全景深度估算中实现了最先进的表现。

In this paper, we propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image. An omnidirectional image has a full field-of-view, providing much more complete descriptions of the scene than perspective images. However, fully-convolutional networks that most current solutions rely on fail to capture rich global contexts from the panorama. To address this issue and also the distortion of equirectangular projection in the panorama, we propose Cubemap Vision Transformers (CViT), a new transformer-based architecture that can model long-range dependencies and extract distortion-free global features from the panorama. We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals. To preserve important local features, we further design a convolution-based branch in our pipeline (dubbed GLPanoDepth) and fuse global features from cubemap vision transformers at multiple scales. This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving state-of-the-art performance in panoramic depth estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题