论文标题
像素网络网:野外有效的面部标志性检测
Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild
论文作者
论文摘要
最近,由于热图回归模型在定位面部地标方面的出色性能而变得流行。但是,在这些模型中仍然存在三个主要问题:(1)它们在计算上很昂贵; (2)他们通常对全球形状缺乏明确的限制; (3)通常存在域间隙。为了解决这些问题,我们建议像素内像素网(PIPNET)进行面部地标检测。提出的模型配备了基于热图回归的新型检测头,该检测头在低分辨率特征图上同时进行得分和抵消预测。通过这样做,不再需要重复提高采样层,从而在不牺牲模型准确性的情况下可以大大减少推理时间。此外,提出了一个简单但有效的邻居回归模块来通过融合相邻地标的预测来强制局部约束,从而增强了新检测头的稳健性。为了进一步提高PIPNET的跨域泛化能力,我们提出了使用课程的自我训练。该培训策略能够通过更轻松的任务开始,从跨域中未标记的数据中挖掘出更可靠的伪标记,然后逐渐增加了提供更精确标签的困难。广泛的实验证明了PipNet的优越性,在监督环境下,六个流行的基准测试中的三个获得了最先进的结果。与基线相比,两个跨域测试组的结果也持续改善。值得注意的是,我们的轻质版本的PipNet版本分别在CPU和GPU上以35.7 fps和200 fps的速度运行,同时仍保持最先进方法的竞争精度。 PIPNET代码可从https://github.com/jhb86253817/pipnet获得。
Recently, heatmap regression models have become popular due to their superior performance in locating facial landmarks. However, three major problems still exist among these models: (1) they are computationally expensive; (2) they usually lack explicit constraints on global shapes; (3) domain gaps are commonly present. To address these problems, we propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps. By doing so, repeated upsampling layers are no longer necessary, enabling the inference time to be largely reduced without sacrificing model accuracy. Besides, a simple but effective neighbor regression module is proposed to enforce local constraints by fusing predictions from neighboring landmarks, which enhances the robustness of the new detection head. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum. This training strategy is able to mine more reliable pseudo-labels from unlabeled data across domains by starting with an easier task, then gradually increasing the difficulty to provide more precise labels. Extensive experiments demonstrate the superiority of PIPNet, which obtains state-of-the-art results on three out of six popular benchmarks under the supervised setting. The results on two cross-domain test sets are also consistently improved compared to the baselines. Notably, our lightweight version of PIPNet runs at 35.7 FPS and 200 FPS on CPU and GPU, respectively, while still maintaining a competitive accuracy to state-of-the-art methods. The code of PIPNet is available at https://github.com/jhb86253817/PIPNet.