自动矫正网络，用于无监督的室内深度估计

论文标题

自动矫正网络，用于无监督的室内深度估计

Auto-Rectify Network for Unsupervised Indoor Depth Estimation

论文作者

Bian, Jia-Wang, Zhan, Huangying, Wang, Naiyan, Chin, Tat-Jun, Shen, Chunhua, Reid, Ian

论文摘要

使用未经标记视频训练的CNN进行的单视深度估计已显示出巨大的希望。但是，在街头娱乐驾驶方案中大多获得了出色的结果，并且这种方法在其他设置中通常会失败，尤其是手持设备拍摄的室内视频。在这项工作中，我们确定在手持式设置中表现出的复杂的自我动物是学习深度的关键障碍。我们的基本分析表明，与提供监督信号的翻译（基线）相反，旋转在训练过程中的噪声表现为噪声。为了应对挑战，我们提出了一种数据预处理方法，该方法通过删除其相对旋转以进行有效学习来纠正培训图像。显着改善的绩效证明了我们的动力。在不需要预处理的情况下进行端到端的学习，我们提出了一个具有新型损失功能的自动校正网络，该网络可以自动学习在训练过程中纠正图像。因此，我们的结果优于先前无监督的SOTA方法，而NYUV2数据集则大幅度的差距。我们还证明了我们在扫描和Make3D中训练有素的模型的概括，以及我们在7片和KITTI数据集上提出的学习方法的普遍性。

Single-View depth estimation using the CNNs trained from unlabelled videos has shown significant promise. However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices. In this work, we establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth. Our fundamental analysis suggests that the rotation behaves as noise during training, as opposed to the translation (baseline) which provides supervision signals. To address the challenge, we propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning. The significantly improved performance validates our motivation. Towards end-to-end learning without requiring pre-processing, we propose an Auto-Rectify Network with novel loss functions, which can automatically learn to rectify images during training. Consequently, our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset. We also demonstrate the generalization of our trained model in ScanNet and Make3D, and the universality of our proposed learning method on 7-Scenes and KITTI datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题