论文标题
深度学习框架,以重建面具下的面孔
A Deep Learning Framework to Reconstruct Face under Mask
论文作者
论文摘要
尽管基于深度学习的图像重建方法在从图片中删除对象方面取得了重大成功,但它们尚未获得可接受的结果,以将一致性归因于性别,种族,表达和其他特征,例如面部的拓扑结构。这项工作的目的是从掩盖图像中提取掩模区域,并重建已检测到的区域。这个问题很复杂,因为(i)很难确定隐藏在面具后面的图像的性别,这会导致网络变得困惑并重建男性的脸,反之亦然; (ii)我们可能会从多个角度接收图像,因此很难保持实际形状,面部拓扑结构和自然图像; (iii)各种面具形式存在问题,因为在某些情况下,面罩的面积无法准确地预期;完成后,面具的某些部分保留在脸上。为了解决这一复杂的任务,我们将问题分为三个阶段:地标检测,目标掩模区域的对象检测以及介入地址的面具区域。为了解决第一个问题,我们使用了性别分类,该分类检测到面具后面的实际性别,然后我们检测到蒙版面部图像的里程碑。其次,我们确定了非面胶项目,即蒙版,并使用掩码R-CNN网络创建了观察到的掩码区域的二进制掩码。第三,我们开发了一个使用预期地标创建逼真图像的介绍网络。为了细分面罩,本文使用蒙版R-CNN,并提供二进制分割图来识别掩模区域。此外,我们通过基于GAN的网络将地标作为结构指导生成了图像。本文介绍的研究使用FFHQ和Celeba数据集。
While deep learning-based image reconstruction methods have shown significant success in removing objects from pictures, they have yet to achieve acceptable results for attributing consistency to gender, ethnicity, expression, and other characteristics like the topological structure of the face. The purpose of this work is to extract the mask region from a masked image and rebuild the area that has been detected. This problem is complex because (i) it is difficult to determine the gender of an image hidden behind a mask, which causes the network to become confused and reconstruct the male face as a female or vice versa; (ii) we may receive images from multiple angles, making it extremely difficult to maintain the actual shape, topological structure of the face and a natural image; and (iii) there are problems with various mask forms because, in some cases, the area of the mask cannot be anticipated precisely; certain parts of the mask remain on the face after completion. To solve this complex task, we split the problem into three phases: landmark detection, object detection for the targeted mask area, and inpainting the addressed mask region. To begin, to solve the first problem, we have used gender classification, which detects the actual gender behind a mask, then we detect the landmark of the masked facial image. Second, we identified the non-face item, i.e., the mask, and used the Mask R-CNN network to create the binary mask of the observed mask area. Thirdly, we developed an inpainting network that uses anticipated landmarks to create realistic images. To segment the mask, this article uses a mask R-CNN and offers a binary segmentation map for identifying the mask area. Additionally, we generated the image utilizing landmarks as structural guidance through a GAN-based network. The studies presented in this paper use the FFHQ and CelebA datasets.