计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 134-141.doi: 10.11896/jsjkx.200800201

• 计算机图形学&多媒体 • 上一篇    下一篇

语义区域风格约束下的图像合成

胡妤婕, 常建慧, 张健   

  1. 北京大学深圳研究生院 广东 深圳518055
  • 收稿日期:2020-08-29 修回日期:2020-10-02 出版日期:2021-02-15 发布日期:2021-02-04
  • 通讯作者: 张健(zhangjian.sz@pku.edu.cn)
  • 作者简介:hhuyujie@163.com
  • 基金资助:
    国家自然科学基金(61902009);深圳科技研发项目(201806080921419290)

Image Synthesis with Semantic Region Style Constraint

HU Yu-jie, CHANG Jian-hui, ZHANG Jian   

  1. Shenzhen Graduate School,Peking University,Shenzhen,Guangdong 518055,China
  • Received:2020-08-29 Revised:2020-10-02 Online:2021-02-15 Published:2021-02-04
  • About author:HU Yu-jie,born in 1999,postgraduate.Her main research interests includeima-ge synthesis and so on.
    ZHANG Jian,born in 1985,Ph.D,assistant professor,is a member of China Computer Federation.His main research interests include intelligent multimedia processing,deep learning and optimization,and computer vision.
  • Supported by:
    The National Natural Science Foundation of China(61902009) and Shenzhen Research Project(201806080921419290).

摘要: 生成对抗网络近年来发展迅速,其中语义区域分割与生成模型的结合为图像生成技术研究提供了新方向。在当前的研究中,语义信息作为指导生成的条件,可以通过编辑和控制输入的语义分割掩码来生成理想的特定风格图像。文中提出了一种具有语义区域风格约束的图像生成框架,利用条件对抗生成网络实现了图像分区域的自适应风格控制。具体而言,首先获得图像的语义分割图,并使用风格编码器提取出图像中不同语义区域的风格信息;然后,在生成端将风格信息和语义掩码对应生成器中的每个残差块分别仿射变换为两组调制参数;最后,输入到生成器中的语义特征图根据每个残差块的调制参数加权求和,并通过卷积与上采样渐进式地生成目标风格内容,从而有效地将语义信息和风格信息相结合,得到最终的目标风格内容。针对现有模型难以精准控制各语义区域风格的问题,文中设计了新的风格约束损失,在语义层次上约束区域风格变化,减小不同语义区域的风格编码之间的相互影响;另外,在不影响性能的前提下,采取权重量化的方式,将生成器的参数存储规模压缩为原来的15.6%,有效降低了模型的存储空间消耗。实验结果表明,所提模型的生成质量在主观感受和客观指标上较现有方法均有显著提高,其中FID分数比当前最优模型提升了约3.8%。

关键词: 深度学习, 生成对抗网络, 条件生成模型, 图像生成, 自适应归一化

Abstract: In recent years,generative adversarial networks have developed rapidly,and image synthesis has become an active research direction.Especially,the combination of semantic region segmentation and generative models provides a new insight for image synthesis.Semantic information can be used to edit and control the input semantic segmentation mask to generate the ideal image with a specific style to generate the desired realistic image.However,the current technology cannot achieve the precise control of the style content of each semantic area.This paper proposes a novel framework for image synthesis under semantic region style constraint,and realizes the adaptive style control of per region using conditional generation model.First of all,a style encoder is used to extract the style information of different semantic regions from the semantic segmentation mask obtained.Then at the generation end,the style information and semantic mask are affine transformed into two sets of modulation parameters respectively for each residual block by using adaptive normalization.The semantic feature map input into the generator is weighted sum according to the modulation parameters,which can effectively combine the semantic information and style information,and gene-rate the target style content gradually through convolution and up-sampling.In the end,this paper designs a new style constraint loss function to constrain the change between per-region style at the semantic level,and to reduce the mutual influence between different semantic style code,aiming at the problem that the existing model cannot accurately control the style of each semantic area.In addition,this paper adopts the method of quantifying weights to compress the generator by about 15.6%,effectively reducing the storage size of the model and the network space without performance degradation.The experimental results show that the proposed model has significantly improved both perceptually and quantitively compared to existing methods,where the FID score is about 3.8% higher than the state-of-the-arts model.

Key words: Adaptive normalization, Conditional generative model, Deep learning, Generative adversarial networks, Image synthesis

中图分类号: 

  • TP391
[1] HUANG G,LIU Z,VAN DER MAATEN L,et al.DenselyConnected Convolutional Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[2] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[3] REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[4] YANG S,WANG Z,WANG Z,et al.Controllable Artistic Text Style Transfer via Shape-Matching GAN[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:4442-4451.
[5] AZADI S,FISHER M,KIM V G,et al.Multi-Content GAN for Few-Shot Style Transfer[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2018:7564-7573.
[6] LYU P,BAI X,YAO C,et al.Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis[C]//International Conference on Document Analysis and Recognition.IEEE,2017,1:1095-1100.
[7] SHORTEN C,KHOSHGOFTAAR T M.A Survey on Image Data Augmentation for Deep Learning[J].Journal of Big Data,2019,6(1):60.
[8] KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2013.
[9] VAN OORD A,KALCHBRENNER N,KAVUKCUOGLU K.Pixel Recurrent Neural Networks[C]//International Conference on Machine Learning.2016:1747-1756.
[10] KINGMA D P,DHARIWAL P.Glow:Generative Flow with Invertible 1x1 Convolutions[C]//Advances in Neural Information Processing Systems.2018:10215-10224.
[11] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative Adversarial Nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680.
[12] ZHANG H,GOODFELLOW I,METAXAS D,et al.Self-Attention Generative Adversarial Networks[C]//International Conference on Machine Learning.2019:7354-7363.
[13] BROCK A,DONAHUE J,SIMONYAN K.Large Scale GANTraining for High Fidelity Natural Image Synthesis[C]//International Conference on Learning Representations.2019.
[14] SHAHAM T R,DEKEL T,MICHAELI T.SinGAN:Learning a Generative Model from a Single Natural Image[C]//Procee-dings of the IEEE International Conference on Computer Vision.2019:4570-4580.
[15] YU J,LIN Z,YANG J,et al.Generative Image Inpainting with Contextual Attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5505-5514.
[16] GATYS L,ECKER A S,BETHGE M.Texture Synthesis Using Convolutional Neural Networks[C]//Advances in Neural Information Processing Systems.2015:262-270.
[17] PERARNAU G,VAN DE WEIJER J,RADUCANU B,et al.Invertible Conditional GANs for Image Editing[J].arXiv:1611.06355,2016.
[18] CHOI Y,CHOIM,KIM M,et al.StarGAN:Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8789-8797.
[19] XIAO T,HONG J,MA J.ELEGANT:Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes [C]//Proceedings of the European Conference on Computer Vision.2018:168-184.
[20] CHEN H J,HUI K M,WANG S Y,et al.BeautyGlow:On-Demand Makeup Transfer Framework with Reversible Generative Network[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2019:10042-10050.
[21] CHANG H,LU J,YU F,et al.PairedCycleGAN:AsymmetricStyle Transfer for Applying and Removing Makeup[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:40-48.
[22] GU S,BAO J,YANG H,et al.Mask-Guided Portrait Editing with Conditional GANs[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:3436-3445.
[23] JO Y,PARK J.SC-FEGAN:Face Editing Generative Adversa-rial Network with User's Sketch and Color[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:1745-1753.
[24] ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-Image Translation with Conditional Adversarial Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1125-1134.
[25] WANG T C,LIU M Y,ZHU J Y,et al.High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8798-8807.
[26] PARK T,LIU M Y,WANG T C,et al.Semantic Image Synthesis with Spatially-Adaptive Normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:2337-2346.
[27] LEE C H,LIU Z,WU L,et al.MaskGAN:Towards Diverse and Interactive Facial Image Manipulation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:5549-5558.
[28] WANG X,YU K,DONG C,et al.Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:606-615.
[29] HUANG X,BELONGIE S.Arbitrary Style Transfer in Real-Time With Adaptive Instance Normalization[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1501-1510.
[30] ZHU P,ABDAL R,QIN Y,et al.SEAN:Image Synthesis with Semantic Region-Adaptive Normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:5104-5113.
[31] KARRAS T,LAINE S,AILA T.A Style-Based Generator Architecture for Generative Adversarial Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:4401-4410.
[32] SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[C]//International Conference on Learning Representations.2015.
[33] HAN S,MAO H,DALLY W J.Deep Compression:Compres-sing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[C]//International Conference on Learning Representations.2016.
[34] KARRAS T,AILA T,LAINE S,et al.Progressive Growing of GANs for Improved Quality,Stability,and Variation[C]//International Conference on Learning Representations.2018.
[35] WANG Z,BOVIK A C,SHEIKH H R,et al.Image Quality Assessment:From Error Visibility to Structural Similarity[J].IEEE Transactions on Image Processing,2004,13(4):600-612.
[36] HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium [C]//Advances in Neural Information Processing Systems.2017:6626-6637.
[37] GARCIA-GARCIA A,ORTS-ESCOLANO S,OPREA S,et al.A Review on Deep Learning Techniques Applied to Semantic Segmentation[J].arXiv:1704.06857,2017.
[38] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the Inception Architecture for Computer Vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[4] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[5] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[6] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[7] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[9] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[10] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[11] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[12] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[13] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[14] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[15] 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰.
基于多源迁移学习的大坝裂缝检测
Dam Crack Detection Based on Multi-source Transfer Learning
计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!