计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221000241-10.doi: 10.11896/jsjkx.221000241
陈万泽, 陈家祯, 黄丽清, 叶锋, 黄添强, 罗海峰
CHEN Wanze, CHEN Jiazhen, HUANG Liqing, YE Feng, HUANG Tianqiang, LUO Haifeng
摘要: 基于生成对抗网络(Generative Adversarial Network,GAN)的图像到图像的翻译(Image-to-Image Translation,I2I)技术在各种领域中取得了一系列突破,并广泛应用于图像合成、图像着色、图像超分辨率,特别是在面部属性操作方面获得了深入研究。为了解决目前I2I领域由于模型架构以及数据不均衡所导致的不同翻译方向的生成图像性能表现差异的问题,提出了一种HFIGAN(High Frequency Injection GAN)模型,实现了结合高频信息的可控面部性别伪造。首先在结合高频信息的小波模块中,将编码特征通过离散小波变换进行特征级的分解,将所得到的高频信息在解码阶段对等注入,使得在上采样过程中的源域与目标域之间的信息可以达成平衡状态。其次,针对I2I任务中多域转换在不同方向的翻译难度不一致的问题,通过对损失函数进行重新设计,将难易样本的损失进行放缩,提高难样本对模型的反馈,使模型更专注于难样本的训练从而提升模型性能。最后,提出基于风格特征的多样性正则项,将风格向量在不同空间中的距离度量添加至传统的多样性损失中进行监督,使得模型能在保持生成图像多样性的同时提升图像的生成质量。分别在CelebA-HQ数据集和FFHQ数据集上进行实验并验证了所提方法的有效性。在主流的I2I模型中结合所提损失进行了损失函数通用性验证。实验结果表明,与以往先进方法相比,HFIGAN在面部性别伪造方面性能更加优异,所提出的损失函数具备一定的通用性。
中图分类号:
[1]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2.Cambridge,MA,US:MIT Press,2014:2672-2680. [2]MIRZA M,OSINDERO S.Conditional Generative AdversarialNets [EB/OL].(2014-11-06)[2022-08-16].https://arxiv.org/abs/1411.1784. [3]ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE,2017:1125-1134. [4]PARK T,LIU M Y,WANG T C,et al.Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:2337-2346. [5]LEDIG C,THEIS L,HUSZAR F,et al.Photo-Realistic SingleImage Super-Resolution Using a Generative Adversarial Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4681-4690. [6]LI X,ZHANG S,HU J,et al.Image-to-image Translation via Hierarchical Style Disentanglement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Virtual:IEEE,2021:8639-8648. [7]ZHU J Y,PARK T,ISOLA P,et al.Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:2223-2232. [8]HUANG X,LIU M Y,BELONGIE S,et al.Multimodal unsu-pervised image-to-image translation[C]//Proceedings of the European Conference on Computer Vision(ECCV).Munich,Germany,2018:172-189. [9]LEE H Y,TSENG H Y,MAO Q,et al.DRIT++:Diverse Image-to-Image Translation via Disentangled Representations [EB/OL].(2019-05-02) [2022-08-16].https://arxiv.org/abs/1905.01270. [10]CHOI Y,UH Y,YOO J,et al.Stargan v2:Diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA:IEEE,2020:8188-8197. [11]HUANG X,BELONGIE S.Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization[C]//IEEE.2017. [12]MAO Q,LEE H Y,TSENG H Y,et al.Mode seeking generativeadversarial networks for diverse image synthesis[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:1429-1437. [13]LIN T Y,GOYAL P,GIRSHICKR,et al.Focal Loss for Dense Object Detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988. [14]KARRAS T,AILA T,LAINE S,et al.Progressive Growing of GANs for Improved Quality,Stability,and Variation [EB/OL].(2018-02-26) [2022-08-16].https://arxiv.org/abs/1710.10196. [15]KARRAS T,LAINE S,AILA T.A Style-Based Generator Architecture for Generative Adversarial Networks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA:IEEE,2019:4401-4410. [16]HE Z,ZUO W,KAN M,et al.AttGAN:Facial Attribute Editing by Only Changing What You Want[J].IEEE Transactions on Image Processing,2019,28(11):5464-5478. [17]LIU M,DING Y,XIA M,et al.Stgan:A unified selective trans-fer network for arbitrary image attribute editing[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA:IEEE,2019:3673-3682. [18]CHOI Y,CHOI M,KIM M,et al.StarGAN:Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Salt Lake City,UT,USA:IEEE,2018:8789-8797. [19]YANG G,FEI N,DING M,et al.L2M-GAN:Learning to Manipulate Latent Space Semantics for Facial Attribute Editing[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Virtual:IEEE,2021:2950-2959. [20]GRASSUCCI E,SIGILLO L,UNCINI A,et al.Hypercomplex Image-to-Image Translation [EB/OL].(2022-05-04) [2022-08-16].https://arxiv.org/abs/2205.02087. [21]LIU Y,SANGINETO E,NADAI M D,et al.Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2021. [22]ZHOU T,KRÄHENBÜHL P,AUBRY M,et al.LearningDense Correspondence via 3D-guided Cycle Consistency[C]//IEEE.2016. [23]ZHOU T,BROWN M,SNAVELY N,et al.UnsupervisedLearning of Depth and Ego-Motion from Video[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017. [24]HOFFMAN J,TZENG E,PARK T,et al.Cycada:Cycle-consistent adversarial domain adaptation[C]//International Confe-rence on Machine Learning.Pmlr,2018:1989-1998. [25]ZHU J Y,PARK T,ISOLA P,et al.Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:2223-2232. [26]LI X,WANG W,WU L,et al.Generalized focal loss:Learning qualified and distributed bounding boxes for dense object detection[J].Advances in Neural Information Processing Systems,2020,33:21002-21012. [27]SPIEGL B.Contrastive Unpaired Translation using Focal Loss for Patch Classification[J].arXiv:2109.12431,2021. [28]YUN P,TAI L,WANG Y,et al.Focal loss in 3d object detection[J].IEEE Robotics and Automation Letters,2019,4(2):1263-1270. [29]RIDNIK T,BEN-BARUCH E,ZAMIR N,et al.Asymmetricloss for multi-label classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:82-91. [30]SMITH L N.Cyclical Focal Loss[EB/OL].(2014-02-16)[2022-08-16].https://arxiv.org/abs/2202.08978. [31]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium[C]//Neural Information Processing Systems(NIPS).Long Beach,CA,USA:MIT Press,2017:6626-6637. [32]ZHANG R,ISOLA P,EFROS A A,et al.The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE,2018:586-595. [33]PARKHI O M,VEDALDI A,ZISSERMAN A.Deep Face Recognition[C]//British Machine Vision Conference.Swansea,UK,2015. |
|