计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 233-240.doi: 10.11896/jsjkx.201100207

• 计算机图形学&多媒体 • 上一篇    下一篇

基于场景图的段落生成序列图像方法

张玮琪1,2, 汤轶丰1,2, 李林燕3, 胡伏原1,4   

  1. 1 苏州科技大学电子与信息工程学院 江苏 苏州215009
    2 苏州科技大学苏州市大数据与信息服务重点实验室 江苏 苏州215009
    3 苏州经贸职业技术学院 江苏 苏州215009
    4 苏州科技大学苏州市虚拟现实智能交互及应用技术重点实验室 江苏 苏州215009
  • 收稿日期:2020-11-30 修回日期:2021-05-26 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 胡伏原(fuyuanhu@mail.usts.edu.cn)
  • 作者简介:weiqizhang1997@163.com
  • 基金资助:
    国家自然科学基金(61876121);江苏省重点研发计划项目(BE2017663);江苏省教育厅高等学校自然科学研究面上项目(19KJB520054);江苏省研究生实践创新项目(SJCX20_1119)

Image Stream From Paragraph Method Based on Scene Graph

ZHANG Wei-qi1,2, TANG Yi-feng1,2, LI Lin-yan3, HU Fu-yuan1,4   

  1. 1 School of Electronic & Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
    2 Suzhou Key Laboratory for Big Data and Information Service,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
    3 Suzhou Institute of Trade and Commerce,Suzhou,Jiangsu 215009,China
    4 Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
  • Received:2020-11-30 Revised:2021-05-26 Online:2022-01-15 Published:2022-01-18
  • About author:ZHANG Wei-qi,born in 1997,postgra-duate.Her main research interests include deep learning and computer vision.
    HU Fu-yuan,born in 1978,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include machine learning and computer vision.
  • Supported by:
    National Natural Science Foundation of China(11771421),CAS “Light of West China” Program,Chongqing Academician-led Science and Technology Innovation Guidance Project(cstc2018jcyj-yszxX0002,cstc2019yszx-jcyjX0003,cstc2020yszx-jcyjX0005) and National Key Research and Development Program(2020YFA0712300).

摘要: 通过生成对抗网络进行段落生成序列图像的任务已经可以生成质量较高的图像。然而当输入的文本涉及多个对象和关系时,文本序列的上下文信息难以提取,生成图像的对象布局容易产生混乱,生成的对象细节不足。针对该问题,文中在StoryGAN的基础上,提出了一种基于场景图的段落生成序列图像方法。首先,通过图卷积将段落转换为多个场景图,每个场景图包含对应文本的对象和关系信息;然后,预测对象的边界框和分割掩膜来计算生成场景布局;最后,根据场景布局和上下文信息生成更符合对象及其关系的序列图像。在CLEVR-SV和CoDraw-SV数据集上进行测试,该方法可以生成包含多个对象及其关系的64×64像素的序列图像。实验结果表明,在CLEVR-SV数据集上,所提方法的SSIM和FID比StoryGAN分别提升了1.34%和9.49%;在CoDraw-SV数据集上,所提方法的ACC比StoryGAN提高了7.40%。所提方法提高了生成场景的布局合理性,不仅可以生成包含多个对象和关系的图像序列,而且生成的图像质量更高,细节更清晰。

关键词: 场景布局, 生成对抗网络, 图卷积神经网络, 文本生成图像

Abstract: The task of generating sequence images from paragraphs by generating confrontation networks can already generate higher quality images.However,when the input text involves multiple objects and relationships,the context information of the text sequence is difficult to extract,the object layout of the generated image is prone to confusion,and the generated object details are insufficient.To solve this problem,this paper proposes a method of generating sequence images based on scene graphs based on StoryGAN.First,the paragraph is converted into multiple scene graphs through graph convolution,each scene graph contains the object and relationship information of the corresponding text.Then,the bounding box and segmentation mask of the object are predicted to calculate the scene layout.Finally,according to the scene layout and the context information,a sequence of images more in line with the object and its relationship is generated.Tests on CLEVR-SV and CoDraw-SV data sets show that the me-thod in this paper can generate 64×64-pixel sequence images containing multiple objects and their relationships.Experimental results show that on the CLEVR-SV data set,the SSIM and FID of this method are improved by 1.34% and 9.49% respectively than StoryGAN.On the CoDraw-SV data set,the ACC of this method is 7.40% higher than that of StoryGAN.The proposed method improves the rationality of the layout of the generated scene,not only can generate an image sequence containing multiple objects and relationships,but also the generated image has higher quality and clearer details.

Key words: Generative adversarial networks, Graph convolutional network, Scene layout, Text-to-image synthesis

中图分类号: 

  • TP391
[1]KINGMA D P,WELLING M.Auto-encoding variational bayes[C]//Proceedings of the International Conference on Learning Representations.2014.
[2]BA J,MNIH V,KAVUKCUOGLU K.Multiple object recognition with visual attention[C]//International Conference on Learning Representations.2015.
[3]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680.
[4]REED S,AKATA Z,YAN X,et al.Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning.2016.
[5]XU T,ZHANG P,HUANG Q,et al.Attngan:Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1316-1324.
[6]LI W,ZHANG P,ZHANG L,et al.Object-driven text-to-image synthesis via adversarial training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12174-12182.
[7]LI Y,GAN Z,SHEN Y,et al.Storygan:A sequential conditional gan for story visualization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6329-6338.
[8]JOHNSON J,GUPTA A,FEI-FEI L.Image generation fromscene graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1219-1228.
[9]XU D,ZHU Y,CHOY C B,et al.Scene graph generation byiterative message passing[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5410-5419.
[10]YANG X,TANG K,ZHANG H,et al.Auto-encoding scenegraphs for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10685-10694.
[11]DHAMO H,FARSHAD A,LAINA I,et al.Semantic ImageManipulation Using Scene Graphs[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5213-5222.
[12]SHI J,ZHANG H,LI J.Explainable and explicit visual reaso-ning over scene graphs[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:8376-8384.
[13]LAN H,LIU Q Y.Image generation from scene graph withgraph attention network[J].Journal of Image and Graphics,2020,25(8):1591-1603.
[14]JOHNSON J,HARIHARAN B,Van Der MAATEN L,et al.Clevr:A diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2901-2910.
[15]JIN-HWA K,NIKITA K,XINLEI C,et al.Codraw:Collaborative drawing as a testbed for grounded goal-driven communication[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:6495-6513.
[16]CHEN Y,DAI X,LIU M,et al.Dynamic convolution:Attention over convolution kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11030-11039.
[17]LIU H,SOCHER R,XIONG C.Taming maml:Efficient un-biased meta-reinforcement learning[C]//International Confe-rence on Machine Learning.2019:4061-4071.
[18]CHEN Q,KOLTUN V.Photographic image synthesis with cascaded refinement networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1511-1520.
[19]ZHANG H,XU T,LI H,et al.Stackgan:Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5907-5915.
[20]FU T J,WANG X,GRAFTON S,et al.Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reaso-ning[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:4413-4422.
[21]CHAN C,GINOSAR S,ZHOU T,et al.Everybody dance now[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:5933-5942.
[22]TULYAKOV S,LIU M Y,YANG X,et al.Mocogan:Decomposing motion and content for video generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1526-1535.
[23]SHEN Y,GU J,TANG X,et al.Interpreting the latent space of gans for semantic face editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9243-9252.
[24]ZITNICK C L,PARIKH D.Bringing semantics into focus using visual abstraction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2013:3009-3016.
[25]SARA U,AKTER M,UDDIN M S.Image quality assessment through FSIM,SSIM,MSE and PSNR-a comparative study[J].Journal of Computer and Communications,2019,7(3):8-18.
[26]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]//Advances in Neural Information Processing Systems.2017:6626-6637.
[1] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[4] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[5] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[6] 尹文兵, 高戈, 曾邦, 王霄, 陈怡.
基于时频域生成对抗网络的语音增强算法
Speech Enhancement Based on Time-Frequency Domain GAN
计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114
[7] 徐辉, 康金梦, 张加万.
基于特征感知的数字壁画复原方法
Digital Mural Inpainting Method Based on Feature Perception
计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105
[8] 李子仪, 周夏冰, 王中卿, 张民.
基于用户关联的立场检测
Stance Detection Based on User Connection
计算机科学, 2022, 49(5): 221-226. https://doi.org/10.11896/jsjkx.210400135
[9] 高志宇, 王天荆, 汪悦, 沈航, 白光伟.
基于生成对抗网络的5G网络流量预测方法
Traffic Prediction Method for 5G Network Based on Generative Adversarial Network
计算机科学, 2022, 49(4): 321-328. https://doi.org/10.11896/jsjkx.210300240
[10] 高越, 傅湘玲, 欧阳天雄, 陈松龄, 闫晨巍.
基于时空自适应图卷积神经网络的脑电信号情绪识别
EEG Emotion Recognition Based on Spatiotemporal Self-Adaptive Graph ConvolutionalNeural Network
计算机科学, 2022, 49(4): 30-36. https://doi.org/10.11896/jsjkx.210900200
[11] 黎思泉, 万永菁, 蒋翠玲.
基于生成对抗网络去影像的多基频估计算法
Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal
计算机科学, 2022, 49(3): 179-184. https://doi.org/10.11896/jsjkx.201200081
[12] 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁.
融合双重权重机制和图卷积神经网络的微博细粒度情感分类
Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network
计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073
[13] 石达, 芦天亮, 杜彦辉, 张建岭, 暴雨轩.
基于改进CycleGAN的人脸性别伪造图像生成模型
Generation Model of Gender-forged Face Image Based on Improved CycleGAN
计算机科学, 2022, 49(2): 31-39. https://doi.org/10.11896/jsjkx.210600012
[14] 唐雨潇, 王斌君.
基于深度生成模型的人脸编辑研究进展
Research Progress of Face Editing Based on Deep Generative Model
计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108
[15] 李建, 郭延明, 于天元, 武与伦, 王翔汉, 老松杨.
基于生成对抗网络的多目标类别对抗样本生成算法
Multi-target Category Adversarial Example Generating Algorithm Based on GAN
计算机科学, 2022, 49(2): 83-91. https://doi.org/10.11896/jsjkx.210800130
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!