论文标题
单视3D对象重建来自内存中的形状先验
Single-View 3D Object Reconstruction from Shape Priors in Memory
论文作者
论文摘要
单视3D对象重建的现有方法直接学习将图像功能转换为3D表示。但是,这些方法容易受到包含嘈杂背景和沉重阻塞的图像,因为提取的图像特征不包含足够的信息来重建高质量的3D形状。人类通常使用图像中的不完整或嘈杂的视觉提示从记忆中检索相似的3D形状,并重建对象的3D形状。在此灵感的启发下,我们提出了一种名为MEM3D的新方法,该方法明确构建了形状先验,以补充图像中缺少的信息。具体而言,形状先验是记忆网络中“图像伏魔”对的形式,该形式是由训练过程中精心设计的写作策略存储的。我们还提出了一个体素三重损失函数,该功能有助于检索与形状先验的输入图像高度相关的精确的3D形状。引入了基于LSTM的形状编码器,以从检索到的3D形状中提取信息,这些信息可用于恢复大量遮挡或在复杂环境中的对象的3D形状。实验结果表明,MEM3D显着提高了重建质量,并对Shapenet和Pix3D数据集上的最新方法表现出色。
Existing methods for single-view 3D object reconstruction directly learn to transform image features into 3D representations. However, these methods are vulnerable to images containing noisy backgrounds and heavy occlusions because the extracted image features do not contain enough information to reconstruct high-quality 3D shapes. Humans routinely use incomplete or noisy visual cues from an image to retrieve similar 3D shapes from their memory and reconstruct the 3D shape of an object. Inspired by this, we propose a novel method, named Mem3D, that explicitly constructs shape priors to supplement the missing information in the image. Specifically, the shape priors are in the forms of "image-voxel" pairs in the memory network, which is stored by a well-designed writing strategy during training. We also propose a voxel triplet loss function that helps to retrieve the precise 3D shapes that are highly related to the input image from shape priors. The LSTM-based shape encoder is introduced to extract information from the retrieved 3D shapes, which are useful in recovering the 3D shape of an object that is heavily occluded or in complex environments. Experimental results demonstrate that Mem3D significantly improves reconstruction quality and performs favorably against state-of-the-art methods on the ShapeNet and Pix3D datasets.