FCDSN-DC：一个准确且轻巧的卷积神经网络，用于立体声估算，并完成深度完成

论文标题

FCDSN-DC：一个准确且轻巧的卷积神经网络，用于立体声估算，并完成深度完成

FCDSN-DC: An Accurate and Lightweight Convolutional Neural Network for Stereo Estimation with Depth Completion

论文作者

Hirner, Dominik, Fraundorfer, Friedrich

论文摘要

我们提出了一个准确且轻巧的卷积神经网络，用于立体声估算，并完成深度完成。我们将此方法命名为具有深度完成（FCDSN-DC）的完全横向变形可变形相似性网络。该方法通过改进特征提取器来扩展FC-DCNN，添加一个网络结构，用于训练高度准确的相似性功能和网络结构，以填充不一致的差异估计。整个方法由三个部分组成。第一部分由完全连接的密度连接的层组成，该图层计算整流图像对的表达特征。我们网络的第二部分学习了这项学习的功能之间高度准确的相似性功能。它由密集连接的卷积层组成，最终具有可变形的卷积块，以进一步提高结果的准确性。在此步骤之后，创建了初始视差图，并执行左右一致性检查以删除不一致的点。然后，网络的最后一部分将此输入与相应的左RGB图像一起使用，以训练填充缺失测量值的网络。一致的深度估计围绕无效点收集，并将RGB点一起解析为浅CNN网络结构，以恢复缺失值。我们评估了有关挑战现实世界室内和室外场景的方法，尤其是米德尔伯里（Middlebury），基蒂（Kitti）和ETH3D会产生竞争成果。我们此外表明，该方法概括了，并且非常适合许多应用，而无需进一步培训。我们的完整框架的代码可在以下网址提供：https：//github.com/thedodo/fcdsn-dc

We propose an accurate and lightweight convolutional neural network for stereo estimation with depth completion. We name this method fully-convolutional deformable similarity network with depth completion (FCDSN-DC). This method extends FC-DCNN by improving the feature extractor, adding a network structure for training highly accurate similarity functions and a network structure for filling inconsistent disparity estimates. The whole method consists of three parts. The first part consists of fully-convolutional densely connected layers that computes expressive features of rectified image pairs. The second part of our network learns highly accurate similarity functions between this learned features. It consists of densely-connected convolution layers with a deformable convolution block at the end to further improve the accuracy of the results. After this step an initial disparity map is created and the left-right consistency check is performed in order to remove inconsistent points. The last part of the network then uses this input together with the corresponding left RGB image in order to train a network that fills in the missing measurements. Consistent depth estimations are gathered around invalid points and are parsed together with the RGB points into a shallow CNN network structure in order to recover the missing values. We evaluate our method on challenging real world indoor and outdoor scenes, in particular Middlebury, KITTI and ETH3D were it produces competitive results. We furthermore show that this method generalizes well and is well suited for many applications without the need of further training. The code of our full framework is available at: https://github.com/thedodo/FCDSN-DC

下载PDF全文

下载文献需遵守相关版权规定

论文标题