Computer Science ›› 2020, Vol. 47 ›› Issue (6): 133-137.doi: 10.11896/jsjkx.190600110

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Scene Graph Generation Model Combining Attention Mechanism and Feature Fusion

HUANG Yong-tao, YAN Hua   

  1. School of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China
  • Received:2019-06-20 Online:2020-06-15 Published:2020-06-10
  • About author:HUANG Yong-tao,born in 1995,postgraduate,is not member of China Computer Federation.His main research interests include computer vision,deep learning and parallel computing.
    YAN Hua,born in 1971,Ph.D,professor.His main research interests include intelligent algorithm,storage system and path planning.
  • Supported by:
    This work was supported by the Natural Science Foundation of China(61403265).

Abstract: Understanding a visual scene can not only identify a single object in isolation,but also get the interaction between different objects.Generating scene graph can obtain all the tuples(subject-predicate-object) and describe the object relationships inside an image,which is widely used in image understanding tasks.To solve the problem that the existing scene graph generation models use complicated structures with slow inference speed,a scene graph generation model combining attention mechanism and feature fusion with Factorizable Net structure was proposed.Firstly,a image is decomposed into subgraphs,where each subgraph contains several objects and their relationships.Then,the position and shape information is merged in the object features,and the attention mechanism is used to realize the message transmission between the object features and the subgraph features.Finally,the object classification and the relationship between the objects are inferred according to the object features and the subgraph features.The experimental results show that the accuracy of the visual relationship detection is 22.78% to 25.41%,and the accuracy of the scene graph generation is 16.39% to 22.75%,which is 1.2% and 1.8% higher than Factorizable Net on multiplevi-sual relationship detection datasets.Besides,the proposed model can perform object relationship detection task in 0.6 seconds with a GTX 1080Ti graphics.The results demonstrate that the number of image regions to be inferred is significantly reduced by using the subgraph structure.The feature fusion method and the attention mechanism are used to improve the performance of depth features,so the objects and their relationships can be predicted more quickly and accurately.Therefore,it solves the problem of poor timeliness and low accuracy in the traditional scene graph generation models.

Key words: Scene graph, Visual relationship detection, Attention mechanism, Message transmission, Feature fusion

CLC Number: 

  • TP391.4
[1]JOHNSON J,KRISHNA R,STARK M,et al.Image retrieval using scene graphs[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2015:3668-3678.
[2]CHANG A X,SAVVA M,MANNING C D.Learning spatial knowledge for text to 3d scene generation[C]//Conference on Empirical Methods in Natural Language Processing.2014:2028-2038.
[3]DAI B,ZHANG Y Q,LIN D H.Detecting visual relationships with deep relational net-works[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:3298-3308.
[4]XU D F,ZHU Y K,LEI F F,et al.Scene Graph Generation by Iterative Message Passing[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:3097-3106.
[5]LI Y K,QUYANG W L,WANG X G,et al.Scene graph generation from objects,phrases and region captions[C]//IEEE international Conference on Computer Vision.IEEE Computer Society,2017:1270-1279.
[6]LI Y K,QUYANG W L,WANG X G,et al.Factorizable Net:An Eficient Subgraph-based Framework for Scene Graph Gene-ration[C]//European Conference on Computer Vision.2018:346-363.
[7]LU C,KRISHNA R,BERNSTEIN M,et al.Visual relationship detection with language priors[C]//European Conference on Computer Vision.2016:852-869.
[8]KRISHNA R,ZHU Y K,GROTH O,et al.Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J].International Journal of Computer Vision,2017,123(1):32-73.
[9]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems (NIPS).Palais des Conrès de Montréal,2015:91-99.
[10]GRIRSHICK R.Fast R-CNN[C]//IEEE International Confe-rence on Computer Vision (ICCV).IEEE,2015:1440-1448.
[11]HE K M,GIRSHICK R,GKIOXARI G,et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis and Machine Intelligence PP,no.99 (2018):1.
[12]NAIR V,HINTON G E.Rectified linear units improve Restric-ted Boltzmann machines[C]//27th International Conference on Machine Learning.2010:807-814.
[13]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural ima-ge caption generation with visual attention[C]//32nd International Conference on Machine Learning.2015:2048-2057.
[1] ZHAO Jia-qi, WANG Han-zheng, ZHOU Yong, ZHANG Di, ZHOU Zi-yuan. Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement [J]. Computer Science, 2021, 48(1): 190-196.
[2] LIU Yang, JIN Zhong. Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism [J]. Computer Science, 2021, 48(1): 197-203.
[3] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[4] ZHANG Fan, HE Wen-qi, JI Hong-bing, LI Dan-ping, WANG Lei. Multi-view Dictionary-pair Learning Based on Block-diagonal Representation [J]. Computer Science, 2021, 48(1): 233-240.
[5] WANG Run-zheng, GAO Jian, HUANG Shu-hua, TONG Xin. Malicious Code Family Detection Method Based on Knowledge Distillation [J]. Computer Science, 2021, 48(1): 280-286.
[6] PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong. MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism [J]. Computer Science, 2020, 47(9): 185-189.
[7] ZHAO Wei, LIN Yu-ming, WANG Chao-qiang, CAI Guo-yong. Opinion Word-pairs Collaborative Extraction Based on Dependency Relation Analysis [J]. Computer Science, 2020, 47(8): 164-170.
[8] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[9] LIU Yan, WEN Jing. Complex Scene Text Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(7): 135-140.
[10] YU Yi-lin, TIAN Hong-tao, GAO Jian-wei and WAN Huai-yu. Relation Extraction Method Combining Encyclopedia Knowledge and Sentence Semantic Features [J]. Computer Science, 2020, 47(6A): 40-44.
[11] NI Hai-qing, LIU Dan, SHI Meng-yu. Chinese Short Text Summarization Generation Model Based on Semantic-aware [J]. Computer Science, 2020, 47(6): 74-78.
[12] ZHU Wei, WANG Tu-qiang, CHEN Yue-feng, HE De-feng. Object-level Edge Detection Algorithm Based on Multi-scale Residual Network [J]. Computer Science, 2020, 47(6): 144-150.
[13] PEI Jia-zhen, XU Zeng-chun, HU Ping. Person Re -identification Fusing Viewpoint Mechanism and Pose Estimation [J]. Computer Science, 2020, 47(6): 164-169.
[14] ZHANG Zhi-yang, ZHANG Feng-li, CHEN Xue-qin, WANG Rui-jin. Information Cascade Prediction Model Based on Hierarchical Attention [J]. Computer Science, 2020, 47(6): 201-209.
[15] HU Yu-jia, GAN Wei, ZHU Min. Enhancer-Promoter Interaction Prediction Based on Multi-feature Fusion [J]. Computer Science, 2020, 47(5): 64-71.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .