超越自由观点：创建人类表演的动画体积视频

论文标题

超越自由观点：创建人类表演的动画体积视频

Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances

论文作者

Hilsmann, Anna, Fechteler, Philipp, Morgenstern, Wieland, Paier, Wolfgang, Feldmann, Ingo, Schreer, Oliver, Eisert, Peter

论文摘要

在本文中，我们提出了一条端到端的管道，用于创建人类表演的高质量动画体积视频内容。除了使用免费视频视频的应用外，我们还可以通过（i）通过（i）使用语义和动画属性的捕获数据来重新动画和更改演员的性能，以及（ii）应用混合几何和基于视频的动画方法，这些方法可以直接构成高级数据的动画，而不是创建一个动画模型，从而可以直接创建动画模型。语义丰富和几何动画能力是通过在3D数据中建立时间一致性来实现的，然后使用参数形状自适应的完整人体模型对每个帧进行自动索具。我们的混合几何形状和基于视频的动画方法将经典CG动画的灵活性与真实捕获的数据的现实主义结合在一起。对于姿势编辑，我们将尽可能多地利用捕获的数据，并在运动学上变形捕获的框架以适合所需的姿势。此外，我们在混合几何形状和基于视频的动画方法中对面部的对待与身体的对待不同，在这种动画方法中，粗略的动作和姿势仅在几何形状中进行建模，而在基于视频的纹理中通常缺乏纯粹的几何方法，而通常缺乏纯粹的几何方法。这些被处理为交互式结合以形成新的面部表情。最重要的是，我们了解了具有挑战性的区域的外观，例如牙齿或眼睛，并以基于自动编码器的方法实际填充缺失区域。本文涵盖了完整的管道，从捕获和生成高质量的视频内容，以语义和变形属性丰富，以重新动画和处理数据的最终混合动画。

In this paper, we present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Going beyond the application of free-viewpoint volumetric video, we allow re-animation and alteration of an actor's performance through (i) the enrichment of the captured data with semantics and animation properties and (ii) applying hybrid geometry- and video-based animation methods that allow a direct animation of the high-quality data itself instead of creating an animatable model that resembles the captured data. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data, followed by an automatic rigging of each frame using a parametric shape-adaptive full human body model. Our hybrid geometry- and video-based animation approaches combine the flexibility of classical CG animation with the realism of real captured data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose. Further, we treat the face differently from the body in a hybrid geometry- and video-based animation approach where coarse movements and poses are modeled in the geometry only, while very fine and subtle details in the face, often lacking in purely geometric methods, are captured in video-based textures. These are processed to be interactively combined to form new facial expressions. On top of that, we learn the appearance of regions that are challenging to synthesize, such as the teeth or the eyes, and fill in missing regions realistically in an autoencoder-based approach. This paper covers the full pipeline from capturing and producing high-quality video content, over the enrichment with semantics and deformation properties for re-animation and processing of the data for the final hybrid animation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题