Computer Science ›› 2020, Vol. 47 ›› Issue (6): 133-137.doi: 10.11896/jsjkx.190600110

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Scene Graph Generation Model Combining Attention Mechanism and Feature Fusion

HUANG Yong-tao, YAN Hua   

  1. School of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China
  • Received:2019-06-20 Online:2020-06-15 Published:2020-06-10
  • About author:HUANG Yong-tao,born in 1995,postgraduate,is not member of China Computer Federation.His main research interests include computer vision,deep learning and parallel computing.
    YAN Hua,born in 1971,Ph.D,professor.His main research interests include intelligent algorithm,storage system and path planning.
  • Supported by:
    This work was supported by the Natural Science Foundation of China(61403265).

Abstract: Understanding a visual scene can not only identify a single object in isolation,but also get the interaction between different objects.Generating scene graph can obtain all the tuples(subject-predicate-object) and describe the object relationships inside an image,which is widely used in image understanding tasks.To solve the problem that the existing scene graph generation models use complicated structures with slow inference speed,a scene graph generation model combining attention mechanism and feature fusion with Factorizable Net structure was proposed.Firstly,a image is decomposed into subgraphs,where each subgraph contains several objects and their relationships.Then,the position and shape information is merged in the object features,and the attention mechanism is used to realize the message transmission between the object features and the subgraph features.Finally,the object classification and the relationship between the objects are inferred according to the object features and the subgraph features.The experimental results show that the accuracy of the visual relationship detection is 22.78% to 25.41%,and the accuracy of the scene graph generation is 16.39% to 22.75%,which is 1.2% and 1.8% higher than Factorizable Net on multiplevi-sual relationship detection datasets.Besides,the proposed model can perform object relationship detection task in 0.6 seconds with a GTX 1080Ti graphics.The results demonstrate that the number of image regions to be inferred is significantly reduced by using the subgraph structure.The feature fusion method and the attention mechanism are used to improve the performance of depth features,so the objects and their relationships can be predicted more quickly and accurately.Therefore,it solves the problem of poor timeliness and low accuracy in the traditional scene graph generation models.

Key words: Attention mechanism, Feature fusion, Message transmission, Scene graph, Visual relationship detection

CLC Number: 

  • TP391.4
[1]JOHNSON J,KRISHNA R,STARK M,et al.Image retrieval using scene graphs[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2015:3668-3678.
[2]CHANG A X,SAVVA M,MANNING C D.Learning spatial knowledge for text to 3d scene generation[C]//Conference on Empirical Methods in Natural Language Processing.2014:2028-2038.
[3]DAI B,ZHANG Y Q,LIN D H.Detecting visual relationships with deep relational net-works[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:3298-3308.
[4]XU D F,ZHU Y K,LEI F F,et al.Scene Graph Generation by Iterative Message Passing[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:3097-3106.
[5]LI Y K,QUYANG W L,WANG X G,et al.Scene graph generation from objects,phrases and region captions[C]//IEEE international Conference on Computer Vision.IEEE Computer Society,2017:1270-1279.
[6]LI Y K,QUYANG W L,WANG X G,et al.Factorizable Net:An Eficient Subgraph-based Framework for Scene Graph Gene-ration[C]//European Conference on Computer Vision.2018:346-363.
[7]LU C,KRISHNA R,BERNSTEIN M,et al.Visual relationship detection with language priors[C]//European Conference on Computer Vision.2016:852-869.
[8]KRISHNA R,ZHU Y K,GROTH O,et al.Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J].International Journal of Computer Vision,2017,123(1):32-73.
[9]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems (NIPS).Palais des Conrès de Montréal,2015:91-99.
[10]GRIRSHICK R.Fast R-CNN[C]//IEEE International Confe-rence on Computer Vision (ICCV).IEEE,2015:1440-1448.
[11]HE K M,GIRSHICK R,GKIOXARI G,et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis and Machine Intelligence PP,no.99 (2018):1.
[12]NAIR V,HINTON G E.Rectified linear units improve Restric-ted Boltzmann machines[C]//27th International Conference on Machine Learning.2010:807-814.
[13]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural ima-ge caption generation with visual attention[C]//32nd International Conference on Machine Learning.2015:2048-2057.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[3] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[4] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[5] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[6] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[8] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[9] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[10] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[11] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[12] XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.
[13] PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247.
[14] ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[15] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
Full text



No Suggested Reading articles found!