基于深度生成模型的人脸编辑研究进展

doi:10.11896/jsjkx.210400108

计算机科学 ›› 2022, Vol. 49 ›› Issue (2): 51-61.doi: 10.11896/jsjkx.210400108

• 计算机视觉:理论与应用 • 上一篇下一篇

基于深度生成模型的人脸编辑研究进展

唐雨潇, 王斌君

中国人民公安大学信息网络安全学院北京100038

收稿日期:2021-04-12 修回日期:2021-07-06 出版日期:2022-02-15 发布日期:2022-02-23
通讯作者: 王斌君(wangbinjun@ppsuc.edu.cn)
作者简介:851637579@qq.com
基金资助:
国家社会科学基金重点项目(20AZD114);CCF-绿盟科技“鲲鹏”科研基金(CCF-NSFOCUS 2020011);中国人民公安大学公共安全行为科学实验室开放课题基金(2020sys08)

Research Progress of Face Editing Based on Deep Generative Model

TANG Yu-xiao, WANG Bin-jun

College of Information Network Security,People's Public Security University of China,Beijing 100038,China

Received:2021-04-12 Revised:2021-07-06 Online:2022-02-15 Published:2022-02-23
About author:TANG Yu-xiao,born in 1997,master.Her main research interests include artificial intelligence and face editing.
WANG Bin-jun,born in 1962,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include cyber security and artificial intelligence.
Supported by:
Key Program of National Social Science Foundation(20AZD114),CCF-NSFOCUS “Kunpeng” Scientific Research Fund(CCF-NSFOCUS 2020011) and Open Research Fund of the Public Security Behavioral Science Laboratory,People's Public Security University of China (2020sys08).

摘要/Abstract

摘要： 人脸编辑广泛应用于公安追逃、人脸美化等领域,传统的统计学方法、基于原型的方法是解决人脸编辑的主要手段,然而这些传统技术面临着操作难度大、计算成本高等问题。近年来,深度学习快速发展,特别是生成网络的出现,为人脸编辑提供了一种全新的思路,采用深度生成模型的人脸编辑技术具有速度快、模型泛化能力强的优势。为总结近年利用深度生成模型解决人脸编辑问题的相关理论与研究,首先介绍了基于深度生成模型的人脸编辑技术采用的网络框架与原理;然后对该项技术所运用的方法进行详述,将其归纳为图像翻译、在网络内部引入条件信息、操纵潜在空间3个方面;最后总结了该项技术所面临的身份一致性、属性解耦、属性编辑精确性的挑战,并指出未来该方向亟待解决的若干问题。

关键词: 变分自编码器, 潜在空间, 人脸编辑, 深度学习, 生成对抗网络

Abstract: Face editing is widely used in public security pursuits,face beautification and other fields.Traditional statistical me-thods and prototype-based methods are the main means to solve face editing.However,these traditional technologies face pro-blems such as difficult operation and high computational cost.In recent years,with the development of deep learning,especially the emergence of generative networks,a brand new idea has been provided for face editing.Face editing technology using deep generative models has the advantages of fast speed and strong model generalization ability.In order to summarize and review the related theories and research on the use of deep generative models to solve the problem of face editing in recent years,firstly,we introduce the network framework and principles adopted by the face editing technology based on deep generative models.Then,the methods used in this technology are described in detail,and we summarize it into three aspects:image translation,introduction of conditional information within the network,and manipulation of potential space.Finally,we summarize the challenges faced by this technology,which consists of identity consistency,attribute decoupling,and attribute editing accuracy,and point out the issues of the technology that need to be resolved urgently in future.

Key words: Deep learning, Face editing, GAN, Latent space, VAE

中图分类号:

TP309

唐雨潇, 王斌君. 基于深度生成模型的人脸编辑研究进展[J]. 计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108

TANG Yu-xiao, WANG Bin-jun. Research Progress of Face Editing Based on Deep Generative Model[J]. Computer Science, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108

参考文献

[1]ZHU Z,LUO P,WANG X,et al.Recover canonical-view faces in the wild with deep neural networks[J].arXiv:1404.3543,2014.
[2]ZHANG Z,PENG Y.Eyeglasses removal from facial imagebased on mvlr[M]//The Era of Interactive Media.New York:Springer,2013:101-109.
[3]GOODFELLOW I,POUGET-ABADIE J,MIRZAM,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680
[4]BROCK A,DONAHUE J,SIMONYAN K.Large scale GANtraining for high fidelity natural image synthesis[J].arXiv:1809.11096,2018.
[5]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4401-4410.
[6]MIRZA M,OSINDERO S.Conditional generative adversarialnets[J].arXiv:1411.1784,2014.
[7]ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1125-1134.
[8]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-imagetranslation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[9]LIU M Y,BREUEL T,KAUTZ J.Unsupervised image-to-image translation networks[J].arXiv:1703.00848,2018.
[10]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Compu-ter-assisted Intervention.Cham:Springer,2015:234-241.
[11]WANG T C,LIU M Y,ZHU J Y,et al.High-resolution image synthesis and semantic manipulation with conditional gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8798-8807.
[12]PORTENIER T,HU Q,SZABO A,et al.Faceshop:Deepsketch-based face image editing[J].arXiv:1804.08972,2018.
[13]JO Y,PARK J.SC-FEGAN:Face Editing Generative Adversa-rial Network with User's Sketch and Color[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1745-1753.
[14]LEE C H,LIU Z,WU L,et al.Maskgan:Towards diverse and interactive facial image manipulation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5549-5558.
[15]CHEN S Y,SU W,GAO L,et al.DeepFaceDrawing:deep gene-ration of face images from sketches[J].ACM Transactions on Graphics (TOG),2020,39(4):1-16.
[16]WANG Z,TANG X,LUO W,et al.Face aging with identity-preserved conditional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7939-7947.
[17]LI M,ZUO W,ZHANG D.Deep identity-aware transfer of facial attributes[J].arXiv:1610.05586,2016.
[18]VIAZOVETSKYI Y,IVASHKIN V,KASHIN E.Stylegan2distillation for feed-forward image manipulation[J].arXiv:2003.03581,2020.
[19]PARK T,LIU M Y,WANG T C,et al.Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:2337-2346.
[20]TAN Z,CHAI M,CHEN D,et al.MichiGAN:multi-input-conditioned hair image generation for portrait editing[J].arXiv:2010.16417,2020.
[21]HU M,GUO J.Facial attribute-controlled sketch-to-imagetranslation with generative adversarial networks[J].EURASIP Journal on Image and Video Processing,2020,2020(1):1-13.
[22]HE Z,ZUO W,KAN M,et al.Arbitrary facial attribute editing:Only change what you want[J].arXiv:1711.10678,2017.
[23]KIM T,CHA M,KIM H,et al.Learning to discover cross-domain relations with generative adversarial networks[J].arXiv:1703.05192,2017.
[24]YI Z,ZHANG H,TAN P,et al.Dualgan:Unsupervised dual learning for image-to-image translation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2849-2857.
[25]CHOI Y,CHOI M,KIM M,et al.Stargan:Unified generativeadversarial networks for multi-domain image-to-image translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8789-8797.
[26]CHOI Y,UH Y,YOO J,et al.Stargan v2:Diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8188-8197.
[27]ZHAO B,CHANG B,JIE Z,et al.Modular generative adversa-rial networks[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:150-165.
[28]ANOOSHEH A,AGUSTSSON E,TIMOFTE R,et al.Combogan:Unrestrained scalability for image domain translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:783-790.
[29]PALSSON S,AGUSTSSON E,TIMOFTE R,et al.Generativeadversarial style transfer networks for face aging[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:2084-2092.
[30]SONG L,LU Z,HE R,et al.Geometry guided adversarial facialexpression synthesis[C]//Proceedings of the 26th ACM International Conference on Multimedia.2018:627-635.
[31]PUMAROLA A,AGUDO A,MARTINEZA M,et al.Ganimation:Anatomically-aware facial animation from a single image[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:818-833.
[32]WU R,ZHANG G,LU S,et al.Cascade ef-gan:Progressive facial expression editing with local focuses[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5021-5030.
[33]LI T,QIAN R,DONG C,et al.Beautygan:Instance-level facial makeup transfer with deep generative adversarial network[C]//Proceedings of the 26th ACM International Conference on Multimedia.2018:645-653.
[34]CHANG H,LU J,YU F,et al.Pairedcyclegan:Asymmetricstyle transfer for applying and removing makeup[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:40-48.
[35]WU P W,LIN Y J,CHANG C H,et al.Relgan:Multi-domain image-to-image translation via relative attributes[C]//Procee-dings of the IEEE/CVF International Conference on Computer Vision.2019:5914-5922.
[36]ZHANG H,CHEN W,TIAN J,et al.RAG:Facial AttributeEditing by Learning Residual Attributes[J].IEEE Access,2019,7:83266-83276.
[37]SHEN W,LIU R.Learning residual images for face attributemanipulation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4030-4038.
[38]HUANG X,LIU M Y,BELONGIE S,et al.Multimodal unsu-pervised image-to-image translation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:172-189.
[39]LEE H Y,TSENG H Y,HUANG J B,et al.Diverse image-to-image translation via disentangled representations[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:35-51.
[40]LAMPLE G,ZEGHIDOUR N,USUNIER N,et al.Fader networks:Manipulating images by sliding attributes[J].arXiv:1706.00409,2017.
[41]ZHOU S,XIAO T,YANG Y,et al.Genegan:Learning objecttransfiguration and attribute subspace from unpaired data[J].arXiv:1705.04932,2017.
[42]XIAO T,HONG J,MA J.DNA-GAN:Learning disentangled representations from multi-attribute images[J].arXiv:1711.05415,2017.
[43]XIAO T,HONG J,MA J.Elegant:Exchanging latent encodings with gan for transferring multiple face attributes[C]//Procee-dings of the European Conference on Computer Vision (ECCV).2018:168-184.
[44]PERARNAU G,VAN DE WEIJER J,RADUCANU B,et al.Invertible conditional gans for image editing[J].arXiv:1611.06355,2016.
[45]NITZAN Y,BERMANO A,LI Y,et al.Disentangling in latentspace by harnessing a pretrained generator[J].arXiv:2005.07728,2020.
[46]LI X,HU J,ZHANG S,et al.Attribute guided unpaired image-to-image translation with semi-supervised learning[J].arXiv:1904.12428,2019.
[47]LIU Y,LI Q,SUN Z.Attribute-aware face aging with wavelet-based generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:11877-11886.
[48]OR-EL R,SENGUPTA S,FRIED O,et al.Lifespan Age Transformation Synthesis[J].arXiv:2003.09764,2020.
[49]LIU M,DING Y,XIA M,et al.Stgan:A unified selective transfer network for arbitrary image attribute editing[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:3673-3682.
[50]ZHANG Z,SONG Y,QI H.Age progression/regression by conditional adversarial autoencoder[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5810-5818.
[51]GUO J,LIU Y,QIAN Z,et al.Exemplar-based Generative Facial Editing[J].arXiv:2006.00472,2020.
[52]JIANG W,LIU S,GAO C,et al.Psgan:Pose and expression robust spatial-aware gan for customizable makeup transfer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5194-5202.
[53]TEWARI A,ELGHARIB M,BHARAJ G,et al.Stylerig:Rigging stylegan for 3d control over portrait images[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:6142-6151.
[54]EGGER B,SMITH W A P,TEWARI A,et al.3d morphableface models－past,present,and future[J].ACM Transactions on Graphics (TOG),2020,39(5):1-38.
[55]KWAK J,HAN D K,KO H.CAFE-GAN:Arbitrary Face Attribute Editing with Complementary Attention Feature[J].arXiv:2011.11900,2020.
[56]LIU Y,FAN H,NI F,et al.ClsGAN:Selective Attribute Editing Model based on Classification Adversarial Network[J].Neural Networks,2017,133:220-228.
[57]XIE D,YANG M,DENG C,et al.Fully-Featured AttributeTransfer[J].arXiv:1902.06258,2019.
[58]CRESWELL A,BHARATH A A.Inverting the generator of a generative adversarial network[J].IEEE Transactions on Neural Networks and Learning Systems,2018,30(7):1967-1974.
[59]GU J,SHEN Y,ZHOU B.Image processing using multi-codegan prior[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3012-3021.
[60]ABDAL R,QIN Y,WONKA P.Image2stylegan:How to embed images into the stylegan latent space?[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:4432-4441.
[61]ABDAL R,QIN Y,WONKA P.Image2StyleGAN++:How to Edit the Embedded Images?[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:8296-8305.
[62]LARSEN A B L,SøNDERBY S K,LAROCHELLE H,et al.Autoencoding beyond pixels using a learned similarity metric[C]//International Conference on Machine Learning.PMLR,2016:1558-1566.
[63]LUO J,XU Y,TANG C,et al.Learning inverse mapping by autoencoder based generative adversarial nets[C]//International Conference on Neural Information Processing.Cham:Springer,2017:207-216.
[64]GUAN S,TAI Y,NI B,et al.Collaborative learning for faster stylegan embedding[J].arXiv:2007.01758,2020.
[65]RICHARDSON E,ALALUF Y,PATASHNIK O,et al.Encoding in style:a stylegan encoder for image-to-image translation[J].arXiv:2008.00951,2020.
[66]ZHU J,SHEN Y,ZHAO D,et al.In-domain gan inversion for real image editing[J].arXiv:2004.00049,2020.
[67]DONAHUE J,KRÄHENBÜHL P,DARRELL T.Adversarialfeature learning[J].arXiv:1605.09782,2016.
[68]DUMOULIN V,BELGHAZI I,POOLE B,et al.Adversarially learned inference[J].arXiv:1606.00704,2016.
[69]DONAHUE J,SIMONYAN K.Large scale adversarial repre-sentation learning[J].arXiv:1907.02544,2019.
[70]RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv:1511.06434,2015.
[71]UPCHURCH P,GARDNER J,PLEISS G,et al.Deep feature interpolation for image content changes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7064-7073.
[72]SHEN Y,GU J,TANG X,et al.Interpreting the latent space of gans for semantic face editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9243-9252.
[73]JAHANIAN A,CHAI L,ISOLA P.On the “steerability” of generative adversarial networks[J].arXiv:1907.07171,2019.
[74]SummitKwan.TL-GAN:transparent latent-space GAN[EB/OL].https://github.com/SummitKwan/transparent_latent_gan.
[75]GOETSCHALCKX L,ANDONIAN A,OLIVA A,et al.Ganalyze:Toward visual definitions of cognitive image properties[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5744-5753.
[76]KHOSLA A,RAJU A S,TORRALBA A,et al.Understanding and predicting image memorability at a large scale[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2390-2398.
[77]ABDAL R,ZHU P,MITRA N,et al.Styleflow:Attribute-conditioned exploration of stylegan-generated images using conditio-nal continuous normalizing flows[J].arXiv:2008.02401.
[78]VAN T P,NGUYEN T M,TRAN N N,et al.Interpreting the Latent Space of Generative Adversarial Networks using Supervised Learning[C]//2020 International Conference on Advanced Computing and Applications (ACOMP).IEEE,2020:49-54.
[79]VOYNOV A,BABENKO A.Unsupervised Discovery of Interpretable Directions in the GAN Latent Space[J].arXiv:2002.03754,2020.
[80]HÄRKÜNEN E,HERTZMANN A,LEHTINEN J,et al.GAN-Space:Discovering Interpretable GAN Controls[J].arXiv:2004.02546,2020.
[81]SHEN Y,ZHOU B.Closed-form factorization of latent semantics in gans[J].arXiv:2007.06600,2020.
[82]PLUMERAULT A,BORGNE H L,HUDELOTC.Controlling generative models with continuous factors of variations[J].ar-Xiv:2001.10238,2020.
[83]BURGESS C P,HIGGINS I,PAL A,et al.Understanding disentangling in beta VAE[J].arXiv:1804.03599,2018.
[84]CHEN R T Q,LI X,GROSSE R,et al.Isolating sources of disentanglement in variational autoencoders[J].arXiv:1802.04942,2018.
[85]CHEN X,DUAN Y,HOUTHOOFT R,et al.Infogan:Inter-pretable representation learning by information maximizing gene-rative adversarial nets[J].arXiv:1606.03657,2016.
[86]MA C G,GUO Y Y,WU P,et al.Summary of Research onGenerated Anti-Network Image Enhancement[J].Information Network Security,2019 (5):10-21.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于深度生成模型的人脸编辑研究进展

Research Progress of Face Editing Based on Deep Generative Model

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3]	张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[4]	王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109
[5]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[6]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[11]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[12]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[13]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[14]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[15]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157