论文标题
与变压器合成的新型视图合成的可通用的神经辐射场
Generalizable Neural Radiance Fields for Novel View Synthesis with Transformer
论文作者
论文摘要
我们提出了一个基于变压器的NERF(Transnerf),以学习一个基于新型视图综合任务的观察视图图像的通用神经辐射场。相比之下,现有的基于MLP的NERF无法直接接收具有任意号码的观察视图,并且需要基于辅助池的操作来融合源视图信息,从而导致源视图与目标渲染视图之间缺少复杂的关系。此外,当前方法分别处理每个3D点,而忽略辐射场场景表示的局部一致性。这些限制可能会在挑战现实世界应用中降低其性能,在这些应用中,可能存在源视图和新颖渲染视图之间的巨大差异。为了应对这些挑战,我们的Transnerf利用注意机制自然地将任意数量的源视图的深层关联解码为基于坐标的场景表示。在统一变压器网络中,在射线铸造空间和周围视图空间中考虑了形状和外观的局部一致性。实验表明,与基于图像的最先进的基于图像的神经渲染方法相比,我们的Transnf在场景 - 不合时宜的和每场习惯的燃烧场景中都可以实现更好的性能,尤其是在源视图和渲染视图之间存在相当大的差距时。
We propose a Transformer-based NeRF (TransNeRF) to learn a generic neural radiance field conditioned on observed-view images for the novel view synthesis task. By contrast, existing MLP-based NeRFs are not able to directly receive observed views with an arbitrary number and require an auxiliary pooling-based operation to fuse source-view information, resulting in the missing of complicated relationships between source views and the target rendering view. Furthermore, current approaches process each 3D point individually and ignore the local consistency of a radiance field scene representation. These limitations potentially can reduce their performance in challenging real-world applications where large differences between source views and a novel rendering view may exist. To address these challenges, our TransNeRF utilizes the attention mechanism to naturally decode deep associations of an arbitrary number of source views into a coordinate-based scene representation. Local consistency of shape and appearance are considered in the ray-cast space and the surrounding-view space within a unified Transformer network. Experiments demonstrate that our TransNeRF, trained on a wide variety of scenes, can achieve better performance in comparison to state-of-the-art image-based neural rendering methods in both scene-agnostic and per-scene finetuning scenarios especially when there is a considerable gap between source views and a rendering view.