基于特征融合的文本到图像的生成

doi:10.11896/jsjkx.200400107

计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 125-130.doi: 10.11896/jsjkx.200400107

• 计算机图形学&多媒体 • 上一篇下一篇

基于特征融合的文本到图像的生成

徐泽, 帅仁俊, 刘开凯, 马力, 吴梦麟

南京工业大学计算机科学与技术学院南京211816

收稿日期:2020-04-23 修回日期:2020-09-07 出版日期:2021-06-15 发布日期:2021-06-03
通讯作者: 帅仁俊(srjwhy@sina.com)
基金资助:
国家自然科学基金(61701222)

Generation of Realistic Image from Text Based on Feature Fusion

XU Ze, SHUAI Ren-jun, LIU Kai-kai, MA Li, WU Meng-lin

College of Computer Science and Technology,Nanjing Tech University,Nanjing 211816,China

Received:2020-04-23 Revised:2020-09-07 Online:2021-06-15 Published:2021-06-03
About author:XU Ze,born in 1994,postgraduate.His main research interests include image processing and machine learning.(1401283266@qq.com)
SHUAI Ren-jun,born in 1962,postgra-duate,associate professor.His main research interests include artificial intelligence and intelligent medical care.
Supported by:
National Natural Science Foundation of China(61701222).

摘要/Abstract

摘要： 近年来,基于生成对抗网络(Generative Adversarial Network,GAN)从文本描述中合成图像这一具有挑战性的任务已经取得了令人鼓舞的结果。这些方法虽然可以生成具有一般形状和颜色的图像,但通常也会生成具有不自然的局部细节且扭曲的全局图像。这是因为卷积神经网络在捕获用于像素级别图像合成的高级语义信息时效率低下,以及处于粗略状态的生成器-鉴别器由于缺少详细信息生成了有缺陷的结果,而这个结果会作为输入促使最终结果的生成。因此,提出了一种基于特征融合的生成对抗网络。该网络通过嵌入残差块特征金字塔结构来引入多尺度特征融合,并通过自适应融合这些特征直接生成最后的精细图像,仅使用一个鉴别器就可以生成256px×256px的逼真图像。将所提方法在花类数据集Oxford-102和加利福尼亚理工学院鸟类数据库CUB上进行验证,使用Inception Score和FID评估生成图像的质量,结果表明,生成图像的质量明显优于以往若干经典的方法。

关键词: 残差块特征金字塔, 鉴别器, 生成对抗网络, 特征融合

Abstract: Recent challenging task of synthesizing images from text descriptions based on the generative adversarial network(GAN) has shown encouraging results.These methods can produce images with general shapes and colors,but often produce global images with unnatural local details and distortions.This is due to the inefficiency of the convolutional neural network in capturing high-level semantic information for pixel-level image synthesis and the fact that the generator-discriminator in a rough state generates flawed results for lack of detail,which then serves as input to the final result.We propose a generative adversarial network based on feature fusion,which introduces multi-scale feature fusion by embedding residual block feature pyramid structure,generates the final fine image directly by adaptive fusion of these features,and produces a 256px×256px realistic image with only one discriminator.The proposed method is verified on the flower data set Oxford-102 and Caltech bird database CUB,and the quality of generated images is evaluated by using Inception Score and FID.The results show that the quality of the generated images produced by the proposed method is better than images produced by some classical methods.

Key words: Discriminator, Feature fusion, Generative adversarial network, Residual block feature pyramid

中图分类号:

TP391

徐泽, 帅仁俊, 刘开凯, 马力, 吴梦麟. 基于特征融合的文本到图像的生成[J]. 计算机科学, 2021, 48(6): 125-130. https://doi.org/10.11896/jsjkx.200400107

XU Ze, SHUAI Ren-jun, LIU Kai-kai, MA Li, WU Meng-lin. Generation of Realistic Image from Text Based on Feature Fusion[J]. Computer Science, 2021, 48(6): 125-130. https://doi.org/10.11896/jsjkx.200400107

参考文献

[1]REED S,AKATA Z,YAN X,et al.Generative adversarial text to image synthesis[J].arXiv:1605.05396,2016.
[2]ZHANG H,XU T,LI H,et al.Stackgan:Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5907-5915.
[3]ZHANG H,XU T,LI H,et al.Stackgan++:Realistic image synthesis with stacked generative adversarial networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(8):1947-1962.
[4]XU T,ZHANG P,HUANG Q,et al.Attngan:Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1316-1324.
[5]WAH C,BRANSONS,WELINDERP,et al.The caltech-ucsd birds-200-2011 dataset.:CNS-TR-20111-001[R].State of California:California Institute of Technology,2011.
[6]NILSBACK M E,ZISSERMAN A.Automated flower classification over a large number of classes[C]//2008 Sixth Indian Conference on Computer Vision,Graphics & Image Processing.IEEE,2008:722-729.
[7]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[8]GREGOR K,DANIHELKA I,GRAVES A,et al.Draw:A recurrent neural network for image generation[J].arXiv:1502.04623,2015.
[9]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[J].Advances in Neural Information Processing Systems,2014,27:2672-2680.
[10]ISOLA P,ZHU J Y,ZHOU T,et al.Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1125-1134.
[11]DENTON E L,CHINTALA S,FERGUS R.Deep generativeimage models using a laplacian pyramid of adversarial networks[J].Advances in Neural Information Processing Systems,2015,28:1486-1494.
[12]ZHANG Z,XIE Y,YANG L.Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6199-6208.
[13]SALIMANS T,GOODFELLOW I,ZAREMBA W,et al.Im-proved techniques for training gans[J].arXiv:1606.03498,2016.
[14]NOH H,HONG S,HAN B.Learning deconvolution network for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1520-1528.
[15]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected onvolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.

相关文章 15

[1]	张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[2]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[3]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[4]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[5]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[6]	郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[7]	杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[8]	陈永平, 朱建清, 谢懿, 吴含笑, 曾焕强. 基于外接圆半径差损失的实时安全帽检测算法 Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss 计算机科学, 2022, 49(6A): 424-428. https://doi.org/10.11896/jsjkx.220100252
[9]	孙洁琪, 李亚峰, 张文博, 刘鹏辉. 基于离散小波变换的双域特征融合深度卷积神经网络 Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation 计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
[10]	尹文兵, 高戈, 曾邦, 王霄, 陈怡. 基于时频域生成对抗网络的语音增强算法 Speech Enhancement Based on Time-Frequency Domain GAN 计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114
[11]	蓝凌翔, 池明旻. 基于特征注意力融合网络的遥感变化检测研究 Remote Sensing Change Detection Based on Feature Fusion and Attention Network 计算机科学, 2022, 49(6): 193-198. https://doi.org/10.11896/jsjkx.210500058
[12]	徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105
[13]	李发光, 伊力哈木·亚尔买买提. 基于改进CenterNet的航拍绝缘子缺陷实时检测模型 Real-time Detection Model of Insulator Defect Based on Improved CenterNet 计算机科学, 2022, 49(5): 84-91. https://doi.org/10.11896/jsjkx.210400142
[14]	董奇达, 王喆, 吴松洋. 结合注意力机制与几何信息的特征融合框架 Feature Fusion Framework Combining Attention Mechanism and Geometric Information 计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180
[15]	李鹏祖, 李瑶, Ibegbu Nnamdi JULIAN, 孙超, 郭浩, 陈俊杰. 基于多特征融合的重叠组套索脑功能超网络构建及分类 Construction and Classification of Brain Function Hypernetwork Based on Overlapping Group Lasso with Multi-feature Fusion 计算机科学, 2022, 49(5): 206-211. https://doi.org/10.11896/jsjkx.210300049

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于特征融合的文本到图像的生成

Generation of Realistic Image from Text Based on Feature Fusion

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0