Computer Science ›› 2021, Vol. 48 ›› Issue (2): 134-141.doi: 10.11896/jsjkx.200800201

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Image Synthesis with Semantic Region Style Constraint

HU Yu-jie, CHANG Jian-hui, ZHANG Jian   

  1. Shenzhen Graduate School,Peking University,Shenzhen,Guangdong 518055,China
  • Received:2020-08-29 Revised:2020-10-02 Online:2021-02-15 Published:2021-02-04
  • About author:HU Yu-jie,born in 1999,postgraduate.Her main research interests includeima-ge synthesis and so on.
    ZHANG Jian,born in 1985,Ph.D,assistant professor,is a member of China Computer Federation.His main research interests include intelligent multimedia processing,deep learning and optimization,and computer vision.
  • Supported by:
    The National Natural Science Foundation of China(61902009) and Shenzhen Research Project(201806080921419290).

Abstract: In recent years,generative adversarial networks have developed rapidly,and image synthesis has become an active research direction.Especially,the combination of semantic region segmentation and generative models provides a new insight for image synthesis.Semantic information can be used to edit and control the input semantic segmentation mask to generate the ideal image with a specific style to generate the desired realistic image.However,the current technology cannot achieve the precise control of the style content of each semantic area.This paper proposes a novel framework for image synthesis under semantic region style constraint,and realizes the adaptive style control of per region using conditional generation model.First of all,a style encoder is used to extract the style information of different semantic regions from the semantic segmentation mask obtained.Then at the generation end,the style information and semantic mask are affine transformed into two sets of modulation parameters respectively for each residual block by using adaptive normalization.The semantic feature map input into the generator is weighted sum according to the modulation parameters,which can effectively combine the semantic information and style information,and gene-rate the target style content gradually through convolution and up-sampling.In the end,this paper designs a new style constraint loss function to constrain the change between per-region style at the semantic level,and to reduce the mutual influence between different semantic style code,aiming at the problem that the existing model cannot accurately control the style of each semantic area.In addition,this paper adopts the method of quantifying weights to compress the generator by about 15.6%,effectively reducing the storage size of the model and the network space without performance degradation.The experimental results show that the proposed model has significantly improved both perceptually and quantitively compared to existing methods,where the FID score is about 3.8% higher than the state-of-the-arts model.

Key words: Adaptive normalization, Conditional generative model, Deep learning, Generative adversarial networks, Image synthesis

CLC Number: 

  • TP391
[1] HUANG G,LIU Z,VAN DER MAATEN L,et al.DenselyConnected Convolutional Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[2] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[3] REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[4] YANG S,WANG Z,WANG Z,et al.Controllable Artistic Text Style Transfer via Shape-Matching GAN[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:4442-4451.
[5] AZADI S,FISHER M,KIM V G,et al.Multi-Content GAN for Few-Shot Style Transfer[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2018:7564-7573.
[6] LYU P,BAI X,YAO C,et al.Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis[C]//International Conference on Document Analysis and Recognition.IEEE,2017,1:1095-1100.
[7] SHORTEN C,KHOSHGOFTAAR T M.A Survey on Image Data Augmentation for Deep Learning[J].Journal of Big Data,2019,6(1):60.
[8] KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2013.
[9] VAN OORD A,KALCHBRENNER N,KAVUKCUOGLU K.Pixel Recurrent Neural Networks[C]//International Conference on Machine Learning.2016:1747-1756.
[10] KINGMA D P,DHARIWAL P.Glow:Generative Flow with Invertible 1x1 Convolutions[C]//Advances in Neural Information Processing Systems.2018:10215-10224.
[11] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680.
[12] ZHANG H,GOODFELLOW I,METAXAS D,et al.Self-Attention Generative Adversarial Networks[C]//International Conference on Machine Learning.2019:7354-7363.
[13] BROCK A,DONAHUE J,SIMONYAN K.Large Scale GANTraining for High Fidelity Natural Image Synthesis[C]//International Conference on Learning Representations.2019.
[14] SHAHAM T R,DEKEL T,MICHAELI T.SinGAN:Learning a Generative Model from a Single Natural Image[C]//Procee-dings of the IEEE International Conference on Computer Vision.2019:4570-4580.
[15] YU J,LIN Z,YANG J,et al.Generative Image Inpainting with Contextual Attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5505-5514.
[16] GATYS L,ECKER A S,BETHGE M.Texture Synthesis Using Convolutional Neural Networks[C]//Advances in Neural Information Processing Systems.2015:262-270.
[17] PERARNAU G,VAN DE WEIJER J,RADUCANU B,et al.Invertible Conditional GANs for Image Editing[J].arXiv:1611.06355,2016.
[18] CHOI Y,CHOIM,KIM M,et al.StarGAN:Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8789-8797.
[19] XIAO T,HONG J,MA J.ELEGANT:Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes [C]//Proceedings of the European Conference on Computer Vision.2018:168-184.
[20] CHEN H J,HUI K M,WANG S Y,et al.BeautyGlow:On-Demand Makeup Transfer Framework with Reversible Generative Network[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2019:10042-10050.
[21] CHANG H,LU J,YU F,et al.PairedCycleGAN:AsymmetricStyle Transfer for Applying and Removing Makeup[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:40-48.
[22] GU S,BAO J,YANG H,et al.Mask-Guided Portrait Editing with Conditional GANs[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:3436-3445.
[23] JO Y,PARK J.SC-FEGAN:Face Editing Generative Adversa-rial Network with User's Sketch and Color[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1745-1753.
[24] ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-Image Translation with Conditional Adversarial Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1125-1134.
[25] WANG T C,LIU M Y,ZHU J Y,et al.High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8798-8807.
[26] PARK T,LIU M Y,WANG T C,et al.Semantic Image Synthesis with Spatially-Adaptive Normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:2337-2346.
[27] LEE C H,LIU Z,WU L,et al.MaskGAN:Towards Diverse and Interactive Facial Image Manipulation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:5549-5558.
[28] WANG X,YU K,DONG C,et al.Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:606-615.
[29] HUANG X,BELONGIE S.Arbitrary Style Transfer in Real-Time With Adaptive Instance Normalization[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1501-1510.
[30] ZHU P,ABDAL R,QIN Y,et al.SEAN:Image Synthesis with Semantic Region-Adaptive Normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:5104-5113.
[31] KARRAS T,LAINE S,AILA T.A Style-Based Generator Architecture for Generative Adversarial Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:4401-4410.
[32] SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[C]//International Conference on Learning Representations.2015.
[33] HAN S,MAO H,DALLY W J.Deep Compression:Compres-sing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[C]//International Conference on Learning Representations.2016.
[34] KARRAS T,AILA T,LAINE S,et al.Progressive Growing of GANs for Improved Quality,Stability,and Variation[C]//International Conference on Learning Representations.2018.
[35] WANG Z,BOVIK A C,SHEIKH H R,et al.Image Quality Assessment:From Error Visibility to Structural Similarity[J].IEEE Transactions on Image Processing,2004,13(4):600-612.
[36] HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium [C]//Advances in Neural Information Processing Systems.2017:6626-6637.
[37] GARCIA-GARCIA A,ORTS-ESCOLANO S,OPREA S,et al.A Review on Deep Learning Techniques Applied to Semantic Segmentation[J].arXiv:1704.06857,2017.
[38] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the Inception Architecture for Computer Vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[5] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[9] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[10] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[11] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[12] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[13] LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11.
[14] SUN Fu-quan, CUI Zhi-qing, ZOU Peng, ZHANG Kun. Brain Tumor Segmentation Algorithm Based on Multi-scale Features [J]. Computer Science, 2022, 49(6A): 12-16.
[15] KANG Yan, XU Yu-long, KOU Yong-qi, XIE Si-yu, YANG Xue-kun, LI Hao. Drug-Drug Interaction Prediction Based on Transformer and LSTM [J]. Computer Science, 2022, 49(6A): 17-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!