计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 209-214.doi: 10.11896/jsjkx.210100135

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多级特征融合与注意力模块的场景识别方法

许华杰1,2, 秦远卓1, 杨洋1   

  1. 1 广西大学计算机与电子信息学院 南宁 530004;
    2 广西多媒体通信与网络技术重点实验室 南宁 530004
  • 收稿日期:2021-01-18 修回日期:2021-05-20 发布日期:2022-04-01
  • 通讯作者: 杨洋(520012399@qq.com)
  • 作者简介:(hjxu2009@163.com)
  • 基金资助:
    广西壮族自治区科技计划项目(2017AB15008); 崇左市科技计划项目(FB2018001)

Scene Recognition Method Based on Multi-level Feature Fusion and Attention Module

XU Hua-jie1,2, QIN Yuan-zhuo1, YANG Yang1   

  1. 1 College of Computer and Electronic Information, Guangxi University, Nanning 530004, China;
    2 Guangxi Key Laboratory of Multimedia Communications and Network Technology, Nanning 530004, China
  • Received:2021-01-18 Revised:2021-05-20 Published:2022-04-01
  • About author:XU Hua-jie,born in 1974,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include artificial intelligence,acoustic signal recognition and computer vision.YANG Yang,born in 1995,postgra-duate.Her main research interests include artificial intelligence and compu-ter vision.
  • Supported by:
    This work was supported by the Science and Technology Plan Project of Guangxi Zhuang Autonomous Region(2017AB15008) and Science and Technology Plan Project of Chongzuo(FB2018001).

摘要: 场景图像通常由背景信息和前景目标对象构成,用于场景识别任务的卷积神经网络(CNN)通常需要根据场景中关键目标的特征,甚至结合目标之间的位置关系来识别出场景所属类别。针对场景图像中较小尺寸的关键目标特征随着网络层次的加深而逐渐消失,从而导致场景识别错误的问题,提出了一种基于多级特征融合与注意力模块的场景识别方法。首先,将深度神经网络ResNet-18的特征提取部分划分出5个分支;然后,将5个分支输出的多级特征进行融合,利用融合后的特征进行场景识别和分类,以弥补丢失的目标信息;最后,在网络中加入改进的注意力模块,以达到着重学习场景图像中关键目标的目的,进一步提升识别效果。在多个场景数据集上进行实验对比,结果表明,所提方法在MIT-67,SUN-397和UIUC-Sports这3个场景数据集上的识别准确率分别达到了88.2%,79.9%和97.7%,相比目前主流的场景识别方法其具有更高的识别准确率。

关键词: 场景识别, 卷积神经网络, 特征融合, 注意力模块

Abstract: Scene image is usually composed of background information and foreground objects.Convolutional neural network (CNN) used for scene recognition task usually needs to recognize the category of scene according to the characteristics of key objects in the scene, or even combined with the position relationship between objects.Aiming at the problem that the key target features of small size in the scene image gradually disappear with the deepening of the network level, which leads to scene recognition errors, a scene recognition method based on multi-level feature fusion and attention module is proposed.Firstly, the feature extraction part of the deep neural network ResNet-18 is divided into five branches, and then the multi-level features of the output of the five branches are fused, and the fused features are used for scene recognition and classification to make up for the lost target information.Secondly, an improved attention module is added to the network to achieve the purpose of focusing on learning the key targets in the scene image, so as to improve the recognition effect further.Experimental results on several scene datasets show that the recognition accuracy of the proposed method on MIT-67, SUN-397 and UIUC-Sports scene datasets reaches 88.2%, 79.9% and 97.7% respectively, which is higher than the current mainstream scene recognition methods.

Key words: Attention module, Convolutional neural network, Feature fusion, Scene recognition

中图分类号: 

  • TP391
[1] TIAN Y L,ZHANG W T,ZHANG Q S,et al.Review on Image Scene Classification Technology[J].Acta Electronica Sinica,2019,47(4):915-926.
[2] XU J L,LI L Y,WAN X J,et al.Indoor scene recognition me-thod combined with target detection[J].Computer Application,2021,41(3):1-6.
[3] LI X Y,ZHU J,MA L N.Survey of Scene Recognition Methods Based on Deep Learning[J].Computer Engineering and Applications,2020,56(5):25-33.
[4] LUIS H,JIANG S,LI X.Scene Recognition with CNNs:Objects,Scales and Dataset Bias[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.2016:571-579.
[5] ZHANG L H,LI L Q,PAN X P,et al.Multi-level ensemble network for scene recognition[J].Multimedia Tools and Applications,2019,78(19):28209-28230.
[6] KUDUS A R,TEH C S.Design and Development of Scene Re-cognition and Classification Model Based on Human Preattention Visual Attention[J].Journal of Physics:Conference Series,2021,1755(1):1-12.
[7] WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//Proceedings of the 2018 European Conference on Computer Vision.2018:3-19.
[8] BAI S,TANG H D,AN S.Coordinate CNNs and LSTMs to ca-tegrize scene images with multi-views and multi-levels of abstraction[J].Expert Systems with Applications,2019,120:298-309.
[9] BAI S.Growing random forest on deep convolutional neural networks for scene categorization[J].Expert Systems with Applications,2017,71:279-287.
[10] TANG P,WANG H,KWONG S.G-MS2F:GoogleNet basedmulti-stage feature fusion of deep CNN for scene recognition[J].IEEE Geoscience and Remote Sensing Letter,2017,225:188-197.
[11] GUO S,HUANG W,QIAO Y.Locally supervised deep hybrid model for scene recognition[J].IEEE Transactions on Image Processing,2017,26(2):808-820.
[12] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[13] HU J,LI S,GANG S.Squeeze-and-Excitation Networks[C]//The 2018 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Salt Lake City,UT,USA,2018:7132-7141.
[14] HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of the International Confe-rence on Computer Vision and Pattern Recognition.2016:770-778.
[15] QUATTONI A,TORRALBA A.Recognizing indoor scenes[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.2009:413-420.
[16] XIAO J,HAYS J,EHINGER K A,et al.SUN database:Large-scale Scene Recognition from abbey to zoo[C]//Proceedings of International Conference on Computer Vision and Pattern Re-cognition.2010:3485-3492.
[17] LI L J,LI F F.What,Where and Who?Classifying Events by Scene and Object Recognition[C]//Proceedings of the International Conference on Computer Vision and Pattern Recognition.2007:1-8.
[18] BAI S,TANG H.Categorizing scenes by exploring scene partinformation without constructing explicit models[J].Neurocomputing,2018(281):160-168.
[19] XIE G S,ZHANG X Y,YAN S,et al.Hybrid CNN and dictio-nary-based models for scene recognition and domain adaption[J].IEEE Transaction on Circuits & Systems for Video Technology,2017,27(6):1263-1274.
[20] MENG X,WANG Z,WU L.Building global image features for scene recognition[J].Pattern Recognition,2012(45):373-380.
[21] GAO C,SANG N,HUANG R.Spatial multi-scale gradientorientation consistency for place instance and scene category re-cognition[J].Information Sciences,2016(372):84-97.
[22] SADEGHI F,TAPPEN M F.Latent pyramidal regions for re-cognizing scenes[C]//Proceedings of European Conference on Computer Vision.Florence,2012:228-241.
[23] HUANG C,LUO W,XIE Y.Local-class-shared topic latentdirichlet allocation based scene classification[J].Multi-media Tools and Applications,2017,76(14):15661-15679.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 魏恺轩, 付莹.
基于重参数化多尺度融合网络的高效极暗光原始图像降噪
Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising
计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179
[6] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
基于边框距离度量的增量目标检测方法
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[7] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[8] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[9] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[10] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[11] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[12] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[13] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!