计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 233-240.doi: 10.11896/jsjkx.201100207
张玮琪1,2, 汤轶丰1,2, 李林燕3, 胡伏原1,4
ZHANG Wei-qi1,2, TANG Yi-feng1,2, LI Lin-yan3, HU Fu-yuan1,4
摘要: 通过生成对抗网络进行段落生成序列图像的任务已经可以生成质量较高的图像。然而当输入的文本涉及多个对象和关系时,文本序列的上下文信息难以提取,生成图像的对象布局容易产生混乱,生成的对象细节不足。针对该问题,文中在StoryGAN的基础上,提出了一种基于场景图的段落生成序列图像方法。首先,通过图卷积将段落转换为多个场景图,每个场景图包含对应文本的对象和关系信息;然后,预测对象的边界框和分割掩膜来计算生成场景布局;最后,根据场景布局和上下文信息生成更符合对象及其关系的序列图像。在CLEVR-SV和CoDraw-SV数据集上进行测试,该方法可以生成包含多个对象及其关系的64×64像素的序列图像。实验结果表明,在CLEVR-SV数据集上,所提方法的SSIM和FID比StoryGAN分别提升了1.34%和9.49%;在CoDraw-SV数据集上,所提方法的ACC比StoryGAN提高了7.40%。所提方法提高了生成场景的布局合理性,不仅可以生成包含多个对象和关系的图像序列,而且生成的图像质量更高,细节更清晰。
中图分类号:
[1]KINGMA D P,WELLING M.Auto-encoding variational bayes[C]//Proceedings of the International Conference on Learning Representations.2014. [2]BA J,MNIH V,KAVUKCUOGLU K.Multiple object recognition with visual attention[C]//International Conference on Learning Representations.2015. [3]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680. [4]REED S,AKATA Z,YAN X,et al.Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning.2016. [5]XU T,ZHANG P,HUANG Q,et al.Attngan:Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1316-1324. [6]LI W,ZHANG P,ZHANG L,et al.Object-driven text-to-image synthesis via adversarial training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12174-12182. [7]LI Y,GAN Z,SHEN Y,et al.Storygan:A sequential conditional gan for story visualization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6329-6338. [8]JOHNSON J,GUPTA A,FEI-FEI L.Image generation fromscene graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1219-1228. [9]XU D,ZHU Y,CHOY C B,et al.Scene graph generation byiterative message passing[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5410-5419. [10]YANG X,TANG K,ZHANG H,et al.Auto-encoding scenegraphs for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10685-10694. [11]DHAMO H,FARSHAD A,LAINA I,et al.Semantic ImageManipulation Using Scene Graphs[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5213-5222. [12]SHI J,ZHANG H,LI J.Explainable and explicit visual reaso-ning over scene graphs[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:8376-8384. [13]LAN H,LIU Q Y.Image generation from scene graph withgraph attention network[J].Journal of Image and Graphics,2020,25(8):1591-1603. [14]JOHNSON J,HARIHARAN B,Van Der MAATEN L,et al.Clevr:A diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2901-2910. [15]JIN-HWA K,NIKITA K,XINLEI C,et al.Codraw:Collaborative drawing as a testbed for grounded goal-driven communication[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:6495-6513. [16]CHEN Y,DAI X,LIU M,et al.Dynamic convolution:Attention over convolution kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11030-11039. [17]LIU H,SOCHER R,XIONG C.Taming maml:Efficient un-biased meta-reinforcement learning[C]//International Confe-rence on Machine Learning.2019:4061-4071. [18]CHEN Q,KOLTUN V.Photographic image synthesis with cascaded refinement networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1511-1520. [19]ZHANG H,XU T,LI H,et al.Stackgan:Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5907-5915. [20]FU T J,WANG X,GRAFTON S,et al.Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reaso-ning[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:4413-4422. [21]CHAN C,GINOSAR S,ZHOU T,et al.Everybody dance now[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:5933-5942. [22]TULYAKOV S,LIU M Y,YANG X,et al.Mocogan:Decomposing motion and content for video generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1526-1535. [23]SHEN Y,GU J,TANG X,et al.Interpreting the latent space of gans for semantic face editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9243-9252. [24]ZITNICK C L,PARIKH D.Bringing semantics into focus using visual abstraction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2013:3009-3016. [25]SARA U,AKTER M,UDDIN M S.Image quality assessment through FSIM,SSIM,MSE and PSNR-a comparative study[J].Journal of Computer and Communications,2019,7(3):8-18. [26]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]//Advances in Neural Information Processing Systems.2017:6626-6637. |
[1] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[4] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[5] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[6] | 尹文兵, 高戈, 曾邦, 王霄, 陈怡. 基于时频域生成对抗网络的语音增强算法 Speech Enhancement Based on Time-Frequency Domain GAN 计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114 |
[7] | 徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105 |
[8] | 李子仪, 周夏冰, 王中卿, 张民. 基于用户关联的立场检测 Stance Detection Based on User Connection 计算机科学, 2022, 49(5): 221-226. https://doi.org/10.11896/jsjkx.210400135 |
[9] | 高志宇, 王天荆, 汪悦, 沈航, 白光伟. 基于生成对抗网络的5G网络流量预测方法 Traffic Prediction Method for 5G Network Based on Generative Adversarial Network 计算机科学, 2022, 49(4): 321-328. https://doi.org/10.11896/jsjkx.210300240 |
[10] | 高越, 傅湘玲, 欧阳天雄, 陈松龄, 闫晨巍. 基于时空自适应图卷积神经网络的脑电信号情绪识别 EEG Emotion Recognition Based on Spatiotemporal Self-Adaptive Graph ConvolutionalNeural Network 计算机科学, 2022, 49(4): 30-36. https://doi.org/10.11896/jsjkx.210900200 |
[11] | 黎思泉, 万永菁, 蒋翠玲. 基于生成对抗网络去影像的多基频估计算法 Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal 计算机科学, 2022, 49(3): 179-184. https://doi.org/10.11896/jsjkx.201200081 |
[12] | 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁. 融合双重权重机制和图卷积神经网络的微博细粒度情感分类 Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network 计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073 |
[13] | 石达, 芦天亮, 杜彦辉, 张建岭, 暴雨轩. 基于改进CycleGAN的人脸性别伪造图像生成模型 Generation Model of Gender-forged Face Image Based on Improved CycleGAN 计算机科学, 2022, 49(2): 31-39. https://doi.org/10.11896/jsjkx.210600012 |
[14] | 唐雨潇, 王斌君. 基于深度生成模型的人脸编辑研究进展 Research Progress of Face Editing Based on Deep Generative Model 计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108 |
[15] | 李建, 郭延明, 于天元, 武与伦, 王翔汉, 老松杨. 基于生成对抗网络的多目标类别对抗样本生成算法 Multi-target Category Adversarial Example Generating Algorithm Based on GAN 计算机科学, 2022, 49(2): 83-91. https://doi.org/10.11896/jsjkx.210800130 |
|