计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 136-141.doi: 10.11896/jsjkx.190300002
庄志刚, 许青林
ZHUANG Zhi-gang, XU Qing-lin
摘要: 场景图为描述图像内容的结构图(Graph),其在生成过程中存在两个问题:1)二步式场景图生成方法造成有益信息流失,使得任务难度提高;2)视觉关系长尾分布使得模型发生过拟合、关系推理错误率上升。针对这两个问题,文中提出结合多尺度特征图和环型关系推理的场景图生成模型SGiF(Scene Graph in Features)。首先,计算多尺度特征图上的每一特征点存在视觉关系的可能性,并将存在可能性高的特征点特征提取出来;然后,从被提取出的特征中解码得到主宾组合,根据解码结果的类别差异,对结果进行去重,以此得到场景图结构;最后,根据场景图结构检测包含目标关系边在内的环路,将环路上的其他边作为计算调整因子的输入,以该因子调整原关系推理结果,并最终完成场景图的生成。实验设置SGGen和PredCls作为验证项,在大型场景图生成数据集VG(Visual Genome)子集上的实验结果表明,通过使用多尺度特征图,相比二步式基线,SGiF的视觉关系检测命中率提升了7.1%,且通过使用环型关系推理,相比非环型关系推理基线,SGiF的关系推理命中率提升了2.18%,从而证明了SGiF的有效性。
中图分类号:
[1]KAREN S,ANDREW Z.Very Deep Convolutional Networksfor Large-Scale Image Recognition[C]//International Conference on Learning Representations (ICLR).2015. [2]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2016:770-778. [3]HUANG G,LIU Z,LAURENS V D M,et al.Densely Connected Convolutional Networks[C]//IEEE Conference on Compu-ter Vision and Pattern Recognition (CVPR).2017:4700-4708. [4]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//IEEE Confe-rence on Computer Vision and Pattern Recognition (CVPR).2016:779-778. [5]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:7263-7271. [6]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement [J].arXiv:1804.02767. [7]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//European Conference on Computer Vision (ECCV).2016:21-37. [8]LIN T Y,DOLLR,PIOTR,et al.Feature Pyramid Networks for Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:2117-2125. [9]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//IEEE International Conference on Computer Vision (ICCV).2017:2980-2988. [10]ROSS B G,JEFF D,TREVOR D,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2014:580-587. [11]GIRSHICK R.Fast R-CNN[C]//IEEE International Conference on Computer Vision (ICCV).2015:1440-1448. [12]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[C]//Neural Information Processing Systems (NIPS).2015:91-99. [13]KAIMING H,GEORGIA G,PIOTR D,et al.Mask R-CNN[C]//IEEE International Conference on Computer Vision (ICCV).2017:2961-2969. [14]LI Y,QI H,DAI J,et al.Fully Convolutional Instance-aware Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:2359-2367. [15]AGRAWAL A,LU J,ANTOL S,et al.VQA:Visual Question Answering [J].International Journal of Computer Vision,2017,123(1):4-31. [16]JOHNSON J,HARIHARAN B,LAURENS V D M,et al.CLEVR:A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:2901-2910. [17]ORDONEZ V,KULKARNI G,BERG T.Im2text:Describingimages using 1 million captioned photographs[C]//Neural Information Processing Systems (NIPS).2011:1143-1151. [18]VINYALS O,TOSHEV A,BENGIO S,et al.Show and Tell:A Neural Image Caption Generator[C]//The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2015:3156-3164. [19]CARNEIRO G.Supervised Learning of Semantic Classes for Image Annotation and Retrieval [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2007,29(3):394-410. [20]VOGEL J,SCHIELE B.Semantic Modeling of Natural Scenes for Content-Based Image Retrieval [J].International Journal of Computer Vision,2007,72(2):133-157. [21]LU C,KRISHNA R,BERNSTEIN M,et al.Visual Relationship Detection with Language Priors[C]//European Conference on Computer Vision (ECCV).2016:852-869. [22]ZHANG H,KYAW Z,CHANG S F,et al.Visual Translation Embedding Network for Visual Relation Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:5532-5540. [23]DAI B,ZHANG Y,LIN D.Detecting Visual Relationships with Deep Relational Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:3076-3086. [24]XU D,ZHU Y,CHOY C B,et al.Scene Graph Generation by Iterative Message Passing[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2017:5410-5419. [25]ZELLERS R,YATSKAR M,THOMSON S,et al.Neural Motifs:Scene Graph Parsing with Global Context[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2018:5831-5840. [26]NEWELL A,DENG J.Pixels to Graphs by Associative Embedding[C]//Neural Information Processing Systems (NIPS).2017:2171-2180. [27]LIBEN-NOWELL D,KLEINBERG J.The Link PredictionProblem for Social Networks [J].Journal of the American Socie-ty for Information Science and Technology,2003,58(7):1019-1031. [28]BACKSTROM L,LESKOVEC J.Supervised Random Walks:Predicting and Recommending Links in Social Networks[C]//Proceedings of the Fourth ACM International Conference on Web Search and Data Mining.2011:635-644. [29]ANTOINE B,NICOLAS U,ALBERTO G D,et al.Translating Embeddings for Modeling Multi-relational Data[C]//Neural Information Processing Systems (NIPS).2013:2787-2795. [30]KRISHNA R,ZHU Y,GROTH O,et al.Visual Genome:Connecting Language and Vision using Crowdsourced Dense Image Annotations [J].International Journal of Computer Vision,2017,123(1):32-73. [31]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:A Simple Way to Prevent Neural Networks from Overfitting [J].Journal of Machine Learning Research,2014,15(1):1929-1958. [32]TOKUI S,OONO K,HIDO S.Chainer:a Next-Generation Open Source Framework for Deep Learning[C]//Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS).2015. [33]RYOSUKE O,YUYA U,et al.CuPy:A NumPy-Compatible Library for NVIDIA GPU Calculations[C]//Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Proces-sing Systems (NIPS).2017. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[7] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[8] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[9] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[10] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[11] | 杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236 |
[12] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 |
[13] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[14] | 吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039 |
[15] | 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行. 基于步态分类辅助的虚拟IMU的行人导航方法 Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification 计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148 |
|