Binsformer：重新审视自适应箱以进行单眼深度估计

论文标题

Binsformer：重新审视自适应箱以进行单眼深度估计

BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation

论文作者

Li, Zhenyu, Wang, Xuyang, Liu, Xianming, Jiang, Junjun

论文摘要

单眼深度估计是计算机视觉中的一项基本任务，并且引起了人们的关注。最近，某些方法将其重新制定为分类 - 回归任务，以提高模型性能，其中通过预测概率分布和离散箱的线性组合估算连续深度。在本文中，我们提出了一个名为Binsformer的新型框架，该框架是针对基于分类 - 回归的深度估计的。它主要集中于特定任务中的两个关键组件：1）适当生成自适应箱和2）概率分布和垃圾箱预测之间的足够相互作用。为了指定，我们使用变压器解码器生成垃圾箱，将其作为直接设定的预测问题将其视为直接的。我们进一步整合了多尺度解码器结构，以粗略地到细节的方式获得对空间几何信息和估计深度图的全面理解。此外，提出了一个额外的场景理解查询来提高估计准确性，事实证明，模型可以从辅助环境分类任务中隐含地学习有用的信息。对Kitti，NYU和Sun RGB-D数据集进行的广泛实验表明，Binsformer超过了具有突出边缘的最新单眼估计方法。代码和预估计的模型将在\ url {https://github.com/zhyever/monocular-depth-esimatimation-toolbox}上公开提供。

Monocular depth estimation is a fundamental task in computer vision and has drawn increasing attention. Recently, some methods reformulate it as a classification-regression task to boost the model performance, where continuous depth is estimated via a linear combination of predicted probability distributions and discrete bins. In this paper, we present a novel framework called BinsFormer, tailored for the classification-regression-based depth estimation. It mainly focuses on two crucial components in the specific task: 1) proper generation of adaptive bins and 2) sufficient interaction between probability distribution and bins predictions. To specify, we employ the Transformer decoder to generate bins, novelly viewing it as a direct set-to-set prediction problem. We further integrate a multi-scale decoder structure to achieve a comprehensive understanding of spatial geometry information and estimate depth maps in a coarse-to-fine manner. Moreover, an extra scene understanding query is proposed to improve the estimation accuracy, which turns out that models can implicitly learn useful information from an auxiliary environment classification task. Extensive experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that BinsFormer surpasses state-of-the-art monocular depth estimation methods with prominent margins. Code and pretrained models will be made publicly available at \url{https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题