论文标题

HeadPOSR:使用变压器编码器的端到端可训练的头部姿势估算

HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders

论文作者

Dhingra, Naina

论文摘要

在本文中,提议使用单个RGB图像预测头部姿势。 \ textIt {headPosr}使用一个新的体系结构,其中包括变压器编码器。在混凝土中,它包括:(1)骨干; (2)连接器; (3)变压器编码器; (4)预测头。研究了对HPE使用变压器编码器的重要性。进行广泛的消融研究,以改变(1)编码器数量。 (2)头数; (3)不同位置嵌入; (4)不同的激活; (5)输入通道大小,在headPOSR中使用的变压器中。有关使用的进一步研究:(1)不同的骨干,(2)使用不同的学习率。详细的实验和消融研究是使用三种不同的用于HPE的开源数据集进行的,即300W-LP,AFLW2000和BIWI数据集。实验表明,\ textIt {headPosr}胜过所有最先进的方法,包括没有里程碑的方法和其他方法,基于对AFLW2000数据集和BIWI数据集进行的地标或深度估算,当时使用300W-LP进行培训。当比较数据集的结果平均时,它也表现出色,因此为HPE问题设定了基准,这也证明了在最先进的情况下使用变压器的有效性。

In this paper, HeadPosr is proposed to predict the head poses using a single RGB image. \textit{HeadPosr} uses a novel architecture which includes a transformer encoder. In concrete, it consists of: (1) backbone; (2) connector; (3) transformer encoder; (4) prediction head. The significance of using a transformer encoder for HPE is studied. An extensive ablation study is performed on varying the (1) number of encoders; (2) number of heads; (3) different position embeddings; (4) different activations; (5) input channel size, in a transformer used in HeadPosr. Further studies on using: (1) different backbones, (2) using different learning rates are also shown. The elaborated experiments and ablations studies are conducted using three different open-source widely used datasets for HPE, i.e., 300W-LP, AFLW2000, and BIWI datasets. Experiments illustrate that \textit{HeadPosr} outperforms all the state-of-art methods including both the landmark-free and the others based on using landmark or depth estimation on the AFLW2000 dataset and BIWI datasets when trained with 300W-LP. It also outperforms when averaging the results from the compared datasets, hence setting a benchmark for the problem of HPE, also demonstrating the effectiveness of using transformers over the state-of-the-art.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源