计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221000241-10.doi: 10.11896/jsjkx.221000241

• 图像处理&多媒体技术 • 上一篇    下一篇

结合小波变换高频信息的可控面部性别伪造

陈万泽, 陈家祯, 黄丽清, 叶锋, 黄添强, 罗海峰   

  1. 福建师范大学计算机与网络空间安全学院 福州 350117
  • 发布日期:2023-11-09
  • 通讯作者: 陈家祯(jiazhen_chen@fjnu.edu.cn)
  • 作者简介:(chen_lex@163.com)
  • 基金资助:
    国家自然基金面上项目(62072106);福建省自然科学基金(2020J01168,2022J01190);福建省教育厅科学基金(JAT210053)

Controlled Facial Gender Forgery Combining Wavelet Transform High Frequency Information

CHEN Wanze, CHEN Jiazhen, HUANG Liqing, YE Feng, HUANG Tianqiang, LUO Haifeng   

  1. College of Computer and Cyber Security,Fujian Normal University,Fuzhou 350117,China
  • Published:2023-11-09
  • About author:CHEN Wanze,born in 1995,postgra-duate.His main research interests include facial image synthesis,facial attribute operation,etc.
    CHEN Jiazhen,born in 1971,associate professor.Her main research interest is information security.
  • Supported by:
    National Natural Science Foundation of China(62072106),Natural Science Foundation of Fujian Province,China(2020J01168,2022J01190) and Scientific Research Fundation of the Education Department of Fujian Province,China(JAT210053).

摘要: 基于生成对抗网络(Generative Adversarial Network,GAN)的图像到图像的翻译(Image-to-Image Translation,I2I)技术在各种领域中取得了一系列突破,并广泛应用于图像合成、图像着色、图像超分辨率,特别是在面部属性操作方面获得了深入研究。为了解决目前I2I领域由于模型架构以及数据不均衡所导致的不同翻译方向的生成图像性能表现差异的问题,提出了一种HFIGAN(High Frequency Injection GAN)模型,实现了结合高频信息的可控面部性别伪造。首先在结合高频信息的小波模块中,将编码特征通过离散小波变换进行特征级的分解,将所得到的高频信息在解码阶段对等注入,使得在上采样过程中的源域与目标域之间的信息可以达成平衡状态。其次,针对I2I任务中多域转换在不同方向的翻译难度不一致的问题,通过对损失函数进行重新设计,将难易样本的损失进行放缩,提高难样本对模型的反馈,使模型更专注于难样本的训练从而提升模型性能。最后,提出基于风格特征的多样性正则项,将风格向量在不同空间中的距离度量添加至传统的多样性损失中进行监督,使得模型能在保持生成图像多样性的同时提升图像的生成质量。分别在CelebA-HQ数据集和FFHQ数据集上进行实验并验证了所提方法的有效性。在主流的I2I模型中结合所提损失进行了损失函数通用性验证。实验结果表明,与以往先进方法相比,HFIGAN在面部性别伪造方面性能更加优异,所提出的损失函数具备一定的通用性。

关键词: 图像生成, 生成对抗网络, 图像到图像的翻译, 人脸属性编辑, 聚焦损失

Abstract: Image-to-image translation(I2I) technology based on generative adversarial networks has made a series of breakthroughs in various fields,and is widely used in image synthesis,image coloring,and image super-resolution,especially in face attribute manipulation.To solve the issue of disparity in the performance of generated images in different translation directions due to model architecture and data imbalance,an high-frequency injection GAN(HFIGAN) model is proposed to achieve controlled facial gender forgery for transmitting high-frequency information.Firstly,in the wavelet module for transmitting high-frequency information,the features in the coding stage are decomposed at the feature level by discrete wavelet transform,and the obtained high-frequency information is injected reciprocally in the decoding stage,so that the information composition between the source and target domains is always in a more desirable ratio.Second,images’ dynamic consistency loss addresses the inconsistent translation difficulty in different directions for multi-domain conversion tasks in I2I.By redesigning the loss function,we scale the loss of difficult and easy samples,improve the feedback of difficult samples to the model,and make the model focus more on training difficult samples to improve performance.Finally,the diversity regular term based on style features is proposed to add the distance metric of style vectors in different spaces to the traditional diversity loss for supervision,which enables the model to maintain the diversity of generated images while improving the quality of image generation.Experiments on CelebA-HQ dataset and FFHQ dataset verify the effectiveness of the proposed method.The generalization of the loss function is verified in the mainstream I2I model combined with the proposed loss in this paper.Experimental results show that HFIGAN has better performance in facial gender falsification compared with previous advanced methods,and the proposed loss function has some generality.

Key words: Image generation, Generative adversarial network, Image-to-Image translation, Facial attribute manipulation, Focal loss

中图分类号: 

  • TP391
[1]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2.Cambridge,MA,US:MIT Press,2014:2672-2680.
[2]MIRZA M,OSINDERO S.Conditional Generative AdversarialNets [EB/OL].(2014-11-06)[2022-08-16].https://arxiv.org/abs/1411.1784.
[3]ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE,2017:1125-1134.
[4]PARK T,LIU M Y,WANG T C,et al.Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:2337-2346.
[5]LEDIG C,THEIS L,HUSZAR F,et al.Photo-Realistic SingleImage Super-Resolution Using a Generative Adversarial Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4681-4690.
[6]LI X,ZHANG S,HU J,et al.Image-to-image Translation via Hierarchical Style Disentanglement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Virtual:IEEE,2021:8639-8648.
[7]ZHU J Y,PARK T,ISOLA P,et al.Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:2223-2232.
[8]HUANG X,LIU M Y,BELONGIE S,et al.Multimodal unsu-pervised image-to-image translation[C]//Proceedings of the European Conference on Computer Vision(ECCV).Munich,Germany,2018:172-189.
[9]LEE H Y,TSENG H Y,MAO Q,et al.DRIT++:Diverse Image-to-Image Translation via Disentangled Representations [EB/OL].(2019-05-02) [2022-08-16].https://arxiv.org/abs/1905.01270.
[10]CHOI Y,UH Y,YOO J,et al.Stargan v2:Diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA:IEEE,2020:8188-8197.
[11]HUANG X,BELONGIE S.Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization[C]//IEEE.2017.
[12]MAO Q,LEE H Y,TSENG H Y,et al.Mode seeking generativeadversarial networks for diverse image synthesis[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:1429-1437.
[13]LIN T Y,GOYAL P,GIRSHICKR,et al.Focal Loss for Dense Object Detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[14]KARRAS T,AILA T,LAINE S,et al.Progressive Growing of GANs for Improved Quality,Stability,and Variation [EB/OL].(2018-02-26) [2022-08-16].https://arxiv.org/abs/1710.10196.
[15]KARRAS T,LAINE S,AILA T.A Style-Based Generator Architecture for Generative Adversarial Networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA:IEEE,2019:4401-4410.
[16]HE Z,ZUO W,KAN M,et al.AttGAN:Facial Attribute Editing by Only Changing What You Want[J].IEEE Transactions on Image Processing,2019,28(11):5464-5478.
[17]LIU M,DING Y,XIA M,et al.Stgan:A unified selective trans-fer network for arbitrary image attribute editing[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA:IEEE,2019:3673-3682.
[18]CHOI Y,CHOI M,KIM M,et al.StarGAN:Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Salt Lake City,UT,USA:IEEE,2018:8789-8797.
[19]YANG G,FEI N,DING M,et al.L2M-GAN:Learning to Manipulate Latent Space Semantics for Facial Attribute Editing[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Virtual:IEEE,2021:2950-2959.
[20]GRASSUCCI E,SIGILLO L,UNCINI A,et al.Hypercomplex Image-to-Image Translation [EB/OL].(2022-05-04) [2022-08-16].https://arxiv.org/abs/2205.02087.
[21]LIU Y,SANGINETO E,NADAI M D,et al.Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2021.
[22]ZHOU T,KRÄHENBÜHL P,AUBRY M,et al.LearningDense Correspondence via 3D-guided Cycle Consistency[C]//IEEE.2016.
[23]ZHOU T,BROWN M,SNAVELY N,et al.UnsupervisedLearning of Depth and Ego-Motion from Video[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017.
[24]HOFFMAN J,TZENG E,PARK T,et al.Cycada:Cycle-consistent adversarial domain adaptation[C]//International Confe-rence on Machine Learning.Pmlr,2018:1989-1998.
[25]ZHU J Y,PARK T,ISOLA P,et al.Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:2223-2232.
[26]LI X,WANG W,WU L,et al.Generalized focal loss:Learning qualified and distributed bounding boxes for dense object detection[J].Advances in Neural Information Processing Systems,2020,33:21002-21012.
[27]SPIEGL B.Contrastive Unpaired Translation using Focal Loss for Patch Classification[J].arXiv:2109.12431,2021.
[28]YUN P,TAI L,WANG Y,et al.Focal loss in 3d object detection[J].IEEE Robotics and Automation Letters,2019,4(2):1263-1270.
[29]RIDNIK T,BEN-BARUCH E,ZAMIR N,et al.Asymmetricloss for multi-label classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:82-91.
[30]SMITH L N.Cyclical Focal Loss[EB/OL].(2014-02-16)[2022-08-16].https://arxiv.org/abs/2202.08978.
[31]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium[C]//Neural Information Processing Systems(NIPS).Long Beach,CA,USA:MIT Press,2017:6626-6637.
[32]ZHANG R,ISOLA P,EFROS A A,et al.The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE,2018:586-595.
[33]PARKHI O M,VEDALDI A,ZISSERMAN A.Deep Face Recognition[C]//British Machine Vision Conference.Swansea,UK,2015.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!