Computer Science ›› 2022, Vol. 49 ›› Issue (1): 233-240.doi: 10.11896/jsjkx.201100207

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Image Stream From Paragraph Method Based on Scene Graph

ZHANG Wei-qi1,2, TANG Yi-feng1,2, LI Lin-yan3, HU Fu-yuan1,4   

  1. 1 School of Electronic & Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
    2 Suzhou Key Laboratory for Big Data and Information Service,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
    3 Suzhou Institute of Trade and Commerce,Suzhou,Jiangsu 215009,China
    4 Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China
  • Received:2020-11-30 Revised:2021-05-26 Online:2022-01-15 Published:2022-01-18
  • About author:ZHANG Wei-qi,born in 1997,postgra-duate.Her main research interests include deep learning and computer vision.
    HU Fu-yuan,born in 1978,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include machine learning and computer vision.
  • Supported by:
    National Natural Science Foundation of China(11771421),CAS “Light of West China” Program,Chongqing Academician-led Science and Technology Innovation Guidance Project(cstc2018jcyj-yszxX0002,cstc2019yszx-jcyjX0003,cstc2020yszx-jcyjX0005) and National Key Research and Development Program(2020YFA0712300).

Abstract: The task of generating sequence images from paragraphs by generating confrontation networks can already generate higher quality images.However,when the input text involves multiple objects and relationships,the context information of the text sequence is difficult to extract,the object layout of the generated image is prone to confusion,and the generated object details are insufficient.To solve this problem,this paper proposes a method of generating sequence images based on scene graphs based on StoryGAN.First,the paragraph is converted into multiple scene graphs through graph convolution,each scene graph contains the object and relationship information of the corresponding text.Then,the bounding box and segmentation mask of the object are predicted to calculate the scene layout.Finally,according to the scene layout and the context information,a sequence of images more in line with the object and its relationship is generated.Tests on CLEVR-SV and CoDraw-SV data sets show that the me-thod in this paper can generate 64×64-pixel sequence images containing multiple objects and their relationships.Experimental results show that on the CLEVR-SV data set,the SSIM and FID of this method are improved by 1.34% and 9.49% respectively than StoryGAN.On the CoDraw-SV data set,the ACC of this method is 7.40% higher than that of StoryGAN.The proposed method improves the rationality of the layout of the generated scene,not only can generate an image sequence containing multiple objects and relationships,but also the generated image has higher quality and clearer details.

Key words: Generative adversarial networks, Graph convolutional network, Scene layout, Text-to-image synthesis

CLC Number: 

  • TP391
[1]KINGMA D P,WELLING M.Auto-encoding variational bayes[C]//Proceedings of the International Conference on Learning Representations.2014.
[2]BA J,MNIH V,KAVUKCUOGLU K.Multiple object recognition with visual attention[C]//International Conference on Learning Representations.2015.
[3]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Advances in Neural Information Processing Systems.2014:2672-2680.
[4]REED S,AKATA Z,YAN X,et al.Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning.2016.
[5]XU T,ZHANG P,HUANG Q,et al.Attngan:Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1316-1324.
[6]LI W,ZHANG P,ZHANG L,et al.Object-driven text-to-image synthesis via adversarial training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12174-12182.
[7]LI Y,GAN Z,SHEN Y,et al.Storygan:A sequential conditional gan for story visualization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6329-6338.
[8]JOHNSON J,GUPTA A,FEI-FEI L.Image generation fromscene graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1219-1228.
[9]XU D,ZHU Y,CHOY C B,et al.Scene graph generation byiterative message passing[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:5410-5419.
[10]YANG X,TANG K,ZHANG H,et al.Auto-encoding scenegraphs for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10685-10694.
[11]DHAMO H,FARSHAD A,LAINA I,et al.Semantic ImageManipulation Using Scene Graphs[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5213-5222.
[12]SHI J,ZHANG H,LI J.Explainable and explicit visual reaso-ning over scene graphs[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2019:8376-8384.
[13]LAN H,LIU Q Y.Image generation from scene graph withgraph attention network[J].Journal of Image and Graphics,2020,25(8):1591-1603.
[14]JOHNSON J,HARIHARAN B,Van Der MAATEN L,et al.Clevr:A diagnostic dataset for compositional language and elementary visual reasoning[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2901-2910.
[15]JIN-HWA K,NIKITA K,XINLEI C,et al.Codraw:Collaborative drawing as a testbed for grounded goal-driven communication[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:6495-6513.
[16]CHEN Y,DAI X,LIU M,et al.Dynamic convolution:Attention over convolution kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11030-11039.
[17]LIU H,SOCHER R,XIONG C.Taming maml:Efficient un-biased meta-reinforcement learning[C]//International Confe-rence on Machine Learning.2019:4061-4071.
[18]CHEN Q,KOLTUN V.Photographic image synthesis with cascaded refinement networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1511-1520.
[19]ZHANG H,XU T,LI H,et al.Stackgan:Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5907-5915.
[20]FU T J,WANG X,GRAFTON S,et al.Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reaso-ning[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:4413-4422.
[21]CHAN C,GINOSAR S,ZHOU T,et al.Everybody dance now[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:5933-5942.
[22]TULYAKOV S,LIU M Y,YANG X,et al.Mocogan:Decomposing motion and content for video generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1526-1535.
[23]SHEN Y,GU J,TANG X,et al.Interpreting the latent space of gans for semantic face editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9243-9252.
[24]ZITNICK C L,PARIKH D.Bringing semantics into focus using visual abstraction[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2013:3009-3016.
[25]SARA U,AKTER M,UDDIN M S.Image quality assessment through FSIM,SSIM,MSE and PSNR-a comparative study[J].Journal of Computer and Communications,2019,7(3):8-18.
[26]HEUSEL M,RAMSAUER H,UNTERTHINER T,et al.Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]//Advances in Neural Information Processing Systems.2017:6626-6637.
[1] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[2] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[3] XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[4] XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223.
[5] ZHAO Xiao-hu, YE Sheng, LI Xiao. Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction [J]. Computer Science, 2022, 49(6): 269-275.
[6] GAO Zhi-yu, WANG Tian-jing, WANG Yue, SHEN Hang, BAI Guang-wei. Traffic Prediction Method for 5G Network Based on Generative Adversarial Network [J]. Computer Science, 2022, 49(4): 321-328.
[7] DOU Zhi, WANG Ning, WANG Shi-jie, WANG Zhi-hui, LI Hao-jie. Sketch Colorization Method with Drawing Prior [J]. Computer Science, 2022, 49(4): 195-202.
[8] ZHOU Hai-yu, ZHANG Dao-qiang. Multi-site Hyper-graph Convolutional Neural Networks and Application [J]. Computer Science, 2022, 49(3): 129-133.
[9] LI Si-quan, WAN Yong-jing, JIANG Cui-ling. Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal [J]. Computer Science, 2022, 49(3): 179-184.
[10] PAN Zhi-hao, ZENG Bi, LIAO Wen-xiong, WEI Peng-fei, WEN Song. Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification [J]. Computer Science, 2022, 49(3): 294-300.
[11] TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo. Text-to-Image Generation Technology Based on Transformer Cross Attention [J]. Computer Science, 2022, 49(2): 107-115.
[12] LIN Zhen-xian, ZHANG Meng-kai, WU Cheng-mao, ZHENG Xing-ning. Face Image Inpainting with Generative Adversarial Network [J]. Computer Science, 2021, 48(9): 174-180.
[13] XU Tao, TIAN Chong-yang, LIU Cai-hua. Deep Learning for Abnormal Crowd Behavior Detection:A Review [J]. Computer Science, 2021, 48(9): 125-134.
[14] PAN Xiao-qin, LU Tian-liang, DU Yan-hui, TONG Xin. Overview of Speech Synthesis and Voice Conversion Technology Based on Deep Learning [J]. Computer Science, 2021, 48(8): 200-208.
[15] SONG Long-ze, WAN Huai-yu, GUO Sheng-nan, LIN You-fang. Multi-task Spatial-Temporal Graph Convolutional Network for Taxi Idle Time Prediction [J]. Computer Science, 2021, 48(7): 112-117.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!