计算机科学 ›› 2020, Vol. 47 ›› Issue (6): 133-137.doi: 10.11896/jsjkx.190600110
黄勇韬, 严华
HUANG Yong-tao, YAN Hua
摘要: 视觉场景理解不仅可以孤立地识别单个物体,还可以得到不同物体之间的相互作用关系。场景图可以获取所有的(主语-谓词-宾语)信息来描述图像内部的对象关系,在场景理解任务中应用广泛。然而,大部分已有的场景图生成模型结构复杂、推理速度慢、准确率低,不能在现实情况下直接使用。因此,在Factorizable Net的基础上提出了一种结合注意力机制与特征融合的场景图生成模型。首先把整个图片分解为若干个子图,每个子图包含多个对象及对象间的关系;然后在物体特征中融合其位置和形状信息,并利用注意力机制实现物体特征和子图特征之间的消息传递;最后根据物体特征和子图特征分别进行物体分类和物体间关系推断。实验结果表明,在多个视觉关系检测数据集上,该模型视觉关系检测的准确率为22.78%~25.41%,场景图生成的准确率为16.39%~22.75%,比Factorizable Net分别提升了1.2%和1.8%;并且利用一块GTX1080Ti显卡可以在0.6 s之内实现对一幅图像的物体和物体间的关系进行检测。实验数据充分说明,采用子图结构明显减少了需要进行关系推断的图像区域数量,利用特征融合方法和基于注意力机制的消息传递机制提升了深度特征的表现能力,可以更快速准确地预测对象及其关系,从而有效解决了传统的场景图生成模型时效性差、准确度低的难题。
中图分类号:
[1]JOHNSON J,KRISHNA R,STARK M,et al.Image retrieval using scene graphs[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2015:3668-3678. [2]CHANG A X,SAVVA M,MANNING C D.Learning spatial knowledge for text to 3d scene generation[C]//Conference on Empirical Methods in Natural Language Processing.2014:2028-2038. [3]DAI B,ZHANG Y Q,LIN D H.Detecting visual relationships with deep relational net-works[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:3298-3308. [4]XU D F,ZHU Y K,LEI F F,et al.Scene Graph Generation by Iterative Message Passing[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:3097-3106. [5]LI Y K,QUYANG W L,WANG X G,et al.Scene graph generation from objects,phrases and region captions[C]//IEEE international Conference on Computer Vision.IEEE Computer Society,2017:1270-1279. [6]LI Y K,QUYANG W L,WANG X G,et al.Factorizable Net:An Eficient Subgraph-based Framework for Scene Graph Gene-ration[C]//European Conference on Computer Vision.2018:346-363. [7]LU C,KRISHNA R,BERNSTEIN M,et al.Visual relationship detection with language priors[C]//European Conference on Computer Vision.2016:852-869. [8]KRISHNA R,ZHU Y K,GROTH O,et al.Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J].International Journal of Computer Vision,2017,123(1):32-73. [9]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems (NIPS).Palais des Conrès de Montréal,2015:91-99. [10]GRIRSHICK R.Fast R-CNN[C]//IEEE International Confe-rence on Computer Vision (ICCV).IEEE,2015:1440-1448. [11]HE K M,GIRSHICK R,GKIOXARI G,et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis and Machine Intelligence PP,no.99 (2018):1. [12]NAIR V,HINTON G E.Rectified linear units improve Restric-ted Boltzmann machines[C]//27th International Conference on Machine Learning.2010:807-814. [13]XU K,BA J,KIROS R,et al.Show,attend and tell:Neural ima-ge caption generation with visual attention[C]//32nd International Conference on Machine Learning.2015:2048-2057. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[3] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[4] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[5] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[8] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[9] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[10] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[11] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[12] | 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075 |
[13] | 彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093 |
[14] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[15] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
|