通过局部非本地关节网络自适应增强面部表达至关重要的区域

论文标题

通过局部非本地关节网络自适应增强面部表达至关重要的区域

Adaptively Enhancing Facial Expression Crucial Regions via Local Non-Local Joint Network

论文作者

Shi, Guanghui, Mao, Shasha, Gou, Shuiping, Yan, Dandan, Jiao, Licheng, Xiong, Lin

论文摘要

由于面部表达数据中较小的阶层间差异，面部表达识别（FER）仍然是一项具有挑战性的研究。鉴于面部关键区域对FER的重要性，许多现有的研究利用了一些注释的关键点的先前信息来提高FER的性能。但是，手动注释面部关键点是复杂的，耗时的，尤其是对于巨大的野外表达图像。基于此，提出了一个局部非本地关节网络，以适应本文中FER的特征学习中的面部关键区域。在提出的方法中，分别基于面部局部和非本地信息构建了两个部分，其中提出了多个局部网络的集合来提取与多个面部局部区域相对应的局部特征，并解决了一个非本地注意力网络以探索每个局部区域的重要性。特别是，非本地网络获得的注意力权重被送入本地部分，以实现面部全球和局部信息之间的交互式反馈。有趣的是，与本地区域相对应的非本地权重逐渐更新，并且对更重要的区域进行了更高的权重。此外，U-NET被用来提取深层语义信息的集成特征和表达图像的低分层详细信息。最后，实验结果表明，与五个基准数据集上的几种最先进的方法相比，所提出的方法具有更具竞争力的性能。值得注意的是，与本地区域相对应的非本地权重的分析表明，所提出的方法可以在特征学习过程中自动增强某些重要区域，而无需任何面部地标信息。

Facial expression recognition (FER) is still one challenging research due to the small inter-class discrepancy in the facial expression data. In view of the significance of facial crucial regions for FER, many existing researches utilize the prior information from some annotated crucial points to improve the performance of FER. However, it is complicated and time-consuming to manually annotate facial crucial points, especially for vast wild expression images. Based on this, a local non-local joint network is proposed to adaptively light up the facial crucial regions in feature learning of FER in this paper. In the proposed method, two parts are constructed based on facial local and non-local information respectively, where an ensemble of multiple local networks are proposed to extract local features corresponding to multiple facial local regions and a non-local attention network is addressed to explore the significance of each local region. Especially, the attention weights obtained by the non-local network is fed into the local part to achieve the interactive feedback between the facial global and local information. Interestingly, the non-local weights corresponding to local regions are gradually updated and higher weights are given to more crucial regions. Moreover, U-Net is employed to extract the integrated features of deep semantic information and low hierarchical detail information of expression images. Finally, experimental results illustrate that the proposed method achieves more competitive performance compared with several state-of-the art methods on five benchmark datasets. Noticeably, the analyses of the non-local weights corresponding to local regions demonstrate that the proposed method can automatically enhance some crucial regions in the process of feature learning without any facial landmark information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题