撕裂：减少象征性的，用于有效的人网恢复变压器

论文标题

撕裂：减少象征性的，用于有效的人网恢复变压器

TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer

论文作者

Dou, Zhiyang, Wu, Qingxuan, Lin, Cheng, Cao, Zeyu, Wu, Qiangqiang, Wan, Weilin, Komura, Taku, Wang, Wenping

论文摘要

在本文中，我们介绍了一组简单但有效的令牌降低（TORE）策略，用于从单眼图像中恢复基于变压器的人网。当前的SOTA性能是通过基于变压器的结构来实现的。但是，它们遭受高模型复杂性和由多余令牌引起的计算成本。我们提出了基于两个重要方面的令牌降低策略，即3D几何结构和2D图像特征，在该特征中，我们从人体结构中分层恢复了网状几何形状，并从人体结构中恢复了先验和执行令牌集群，以使更少但更具歧视性的图像特征令牌传递给变压器。我们的方法大大减少了变压器中高复杂性相互作用的令牌数量。这导致了大幅降低的计算成本，同时仍达到竞争性甚至更高的形状恢复精度。跨广泛的基准测试的广泛实验验证了所提出的方法的出色有效性。我们进一步证明了我们手工网格恢复方法的普遍性。请访问我们的项目页面，网址为https://frank-zy-dou.github.io/projects/tore/index.html。

In this paper, we introduce a set of simple yet effective TOken REduction (TORE) strategies for Transformer-based Human Mesh Recovery from monocular images. Current SOTA performance is achieved by Transformer-based structures. However, they suffer from high model complexity and computation cost caused by redundant tokens. We propose token reduction strategies based on two important aspects, i.e., the 3D geometry structure and 2D image feature, where we hierarchically recover the mesh geometry with priors from body structure and conduct token clustering to pass fewer but more discriminative image feature tokens to the Transformer. Our method massively reduces the number of tokens involved in high-complexity interactions in the Transformer. This leads to a significantly reduced computational cost while still achieving competitive or even higher accuracy in shape recovery. Extensive experiments across a wide range of benchmarks validate the superior effectiveness of the proposed method. We further demonstrate the generalizability of our method on hand mesh recovery. Visit our project page at https://frank-zy-dou.github.io/projects/Tore/index.html.

下载PDF全文

下载文献需遵守相关版权规定

论文标题