论文标题
深度特征融合用于自我监督的单眼深度预测
Deep feature fusion for self-supervised monocular depth prediction
论文作者
论文摘要
端到端无监督学习的最新进展显着改善了单眼深度预测的表现,并缓解了地面真相深度的要求。尽管通过使用平滑度,左右一致性,正则化和匹配的表面正常状态纳入多种损失,但已经完成了大量的工作,但其中一些人考虑了现实世界图像中存在的多尺度结构。大多数作品都利用VGG16或RESNET50模型预先训练的Imagenet权重预测深度。我们提出了一种深入的特征融合方法,该方法利用多个量表从头开始学习自我监督深度。我们的Fusion网络在编码器网络中的每个级别上选择了从上层和下层中的功能,从而创建了多个功能金字塔子网络,这些功能在应用坐标解决方案后被馈送到解码器。我们还提出了一个细化模块,从更高级别的深度特征和较低级别的残留深度的组合使用像素改组框架的组合结合使用,超级溶质较低级别的残留深度。我们选择Kitti数据集进行评估,并表明我们提出的架构可以在深度预测中产生更好或可比较的结果。
Recent advances in end-to-end unsupervised learning has significantly improved the performance of monocular depth prediction and alleviated the requirement of ground truth depth. Although a plethora of work has been done in enforcing various structural constraints by incorporating multiple losses utilising smoothness, left-right consistency, regularisation and matching surface normals, a few of them take into consideration multi-scale structures present in real world images. Most works utilise a VGG16 or ResNet50 model pre-trained on ImageNet weights for predicting depth. We propose a deep feature fusion method utilising features at multiple scales for learning self-supervised depth from scratch. Our fusion network selects features from both upper and lower levels at every level in the encoder network, thereby creating multiple feature pyramid sub-networks that are fed to the decoder after applying the CoordConv solution. We also propose a refinement module learning higher scale residual depth from a combination of higher level deep features and lower level residual depth using a pixel shuffling framework that super-resolves lower level residual depth. We select the KITTI dataset for evaluation and show that our proposed architecture can produce better or comparable results in depth prediction.