论文标题
不要忘记过去:单眼视频的经常性深度估算
Don't Forget The Past: Recurrent Depth Estimation from Monocular Video
论文作者
论文摘要
自动驾驶汽车需要不断更新深度信息。到目前为止,即使该方法从视频输入开始,深度一次是独立于单个帧独立估计的。我们的方法产生了一个时间序列的深度图,这使其成为在线学习方法的理想选择。特别是,我们将三种不同类型的深度估计(监督深度预测,自我监督的深度预测和自我监督的深度完成)放入一个共同的框架中。我们将相应的网络与ConvlSTM集成在一起,以便可以利用跨帧的深度时空结构,以得出更准确的深度估计。我们的方法是灵活的。它可以仅应用于单眼视频,也可以与不同类型的稀疏深度模式结合使用。我们仔细研究了循环网络的架构及其培训策略。我们首先要成功利用经常性网络进行实时自我监督的单眼估计和完成。广泛的实验表明,在两个自我监督的情况下,我们的经常性方法始终如一地优于其基于图像的对应物。它还优于三个流行组的先前深度估计方法。有关详细信息,请参考https://www.trace.ethz.ch/publications/2020/rec_depth_estimatimation/。
Autonomous cars need continuously updated depth information. Thus far, depth is mostly estimated independently for a single frame at a time, even if the method starts from video input. Our method produces a time series of depth maps, which makes it an ideal candidate for online learning approaches. In particular, we put three different types of depth estimation (supervised depth prediction, self-supervised depth prediction, and self-supervised depth completion) into a common framework. We integrate the corresponding networks with a ConvLSTM such that the spatiotemporal structures of depth across frames can be exploited to yield a more accurate depth estimation. Our method is flexible. It can be applied to monocular videos only or be combined with different types of sparse depth patterns. We carefully study the architecture of the recurrent network and its training strategy. We are first to successfully exploit recurrent networks for real-time self-supervised monocular depth estimation and completion. Extensive experiments show that our recurrent method outperforms its image-based counterpart consistently and significantly in both self-supervised scenarios. It also outperforms previous depth estimation methods of the three popular groups. Please refer to https://www.trace.ethz.ch/publications/2020/rec_depth_estimation/ for details.