判别视觉识别的解开

论文标题

判别视觉识别的解开

Disentanglement for Discriminative Visual Recognition

论文作者

Liu, Xiaofeng

论文摘要

基于深度学习的识别的最新成功依赖于维护与主任务标签相关的内容。但是，如何以可控的方式明确地消除嘈杂的信号以进行更好的概括是一个空旷的问题。例如，各种因素，例如特定的属性，姿势，照明和表达会影响面部图像的出现。解开特定于身份的因素可能对面部表达识别（FER）有益。本章系统地总结了与任务相关/无关的语义变化和未指定的潜在变化的有害因素。在本章中，这些问题被视为深度度量学习问题或潜在空间中的对抗性最小游戏。对于以前的选择，可以将广义自适应（N+M） - Tuplet群集损失函数以及身份感知的硬性挖掘和在线阳性采矿方案用于身份不变的FER。通过关节优化，可以在统一的两个完全连接的层分支框架中结合深度度量损失和软效果损失来实现更好的FER性能。对于后一种解决方案，可以为端到端的条件对抗网络配备能够将输入样本分解为三个互补部分的能力。歧视性表示继承了任务的先验知识指导的所需不变属性，该属性与任务相关/无关的语义和潜在变化是独立的。该框架在一系列任务上取得了最高的性能，包括照明，化妆，耐耐耐受性的面部识别和面部属性识别。本章系统地总结了流行和实用的解决方案，以实现更具歧视性的视觉识别。

Recent successes of deep learning-based recognition rely on maintaining the content related to the main-task label. However, how to explicitly dispel the noisy signals for better generalization in a controllable manner remains an open issue. For instance, various factors such as identity-specific attributes, pose, illumination and expression affect the appearance of face images. Disentangling the identity-specific factors is potentially beneficial for facial expression recognition (FER). This chapter systematically summarize the detrimental factors as task-relevant/irrelevant semantic variations and unspecified latent variation. In this chapter, these problems are casted as either a deep metric learning problem or an adversarial minimax game in the latent space. For the former choice, a generalized adaptive (N+M)-tuplet clusters loss function together with the identity-aware hard-negative mining and online positive mining scheme can be used for identity-invariant FER. The better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework via joint optimization. For the latter solution, it is possible to equipping an end-to-end conditional adversarial network with the ability to decompose an input sample into three complementary parts. The discriminative representation inherits the desired invariance property guided by prior knowledge of the task, which is marginal independent to the task-relevant/irrelevant semantic and latent variations. The framework achieves top performance on a serial of tasks, including lighting, makeup, disguise-tolerant face recognition and facial attributes recognition. This chapter systematically summarize the popular and practical solution for disentanglement to achieve more discriminative visual recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题