计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 220-225.doi: 10.11896/jsjkx.200800073

• 计算机图形学&多媒体 • 上一篇    下一篇

基于注意力机制和深度卷积神经网络的材质识别方法

许华杰1,2, 杨洋1, 李桂兰3   

  1. 1 广西大学计算机与电子信息学院 南宁530004
    2 广西多媒体通信与网络技术重点实验室 南宁530004
    3 广西壮族自治区产品质量检验研究院 南宁530007
  • 收稿日期:2020-08-11 修回日期:2020-11-18 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 许华杰(hjxu2009@163.com)
  • 基金资助:
    广西壮族自治区科技计划项目(2017AB15008);崇左市科技计划项目(FB2018001)

Material Recognition Method Based on Attention Mechanism and Deep Convolutional Neural Network

XU Hua-jie1,2, YANG Yang1, LI Gui-lan3   

  1. 1 College of Computer and Electronic Information,Guangxi University,Nanning 530004,China
    2 Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China
    3 Guangxi Institute of Product Quality Inspection,Nanning 530007,China
  • Received:2020-08-11 Revised:2020-11-18 Online:2021-10-15 Published:2021-10-18
  • About author:XU Hua-jie,born in 1974,Ph.D,asso-ciate professor,is a senior member of China Computer Federation.His main research interests include artificial intelligence,acoustic signal recognition and computer vision.
  • Supported by:
    Science and Technology Plan Project of Guangxi Zhuang Autonomous Region(2017AB15008) and Science and Technology Plan Project of Chongzuo(FB2018001).

摘要: 材质识别旨在识别自然材质图像中的主要对象及其所属材料类别。针对材质图像数据集通常数据量少、人工标注局部纹理区域困难所导致的材质识别准确率低的问题,提出了一种基于注意力机制和深度卷积神经网络的材质识别方法,该方法的核心是材质识别深度卷积神经网络(MaterialNet)。MaterialNet利用深度残差网络对图像进行特征提取,采用所提出的级联空洞空间金字塔池化的方式引入注意力机制,使网络可以通过端到端训练自适应地关注包含纹理特征的关键区域,从而有效识别材质的局部纹理特征。在FMD材质数据集上进行实验,结果表明,MaterialNet的总体识别准确率可达到82.3%,比当前主流的B-CNN和CNN+FV材质识别方法分别提高了7.2%和4.5%,对多种材质的识别准确率较高且具有参数量少、计算量小等优点。

关键词: 空洞卷积, 空间金字塔池化, 深度卷积神经网络, 注意力机制

Abstract: The purpose of material recognition is to identify the main objects and their material categories in natural material images.Aiming at the problem of low recognition accuracy caused by the lack of data in material image data sets and the difficulty of manually labeling local texture regions,a material recognition method based on attention mechanism and deep convolutional neural network is proposed.The core of the method is material recognition deep convolutional neural network (MaterialNet).MaterialNet uses the deep residual network to extract the features of the image,and introduces the attention mechanism by the proposed cascaded atrous spatial pyramid pooling method,so that the network can adaptively focus on the key areas containing texture features through end-to-end training,so as to effectively identify the local texture features of materials.Based on the FMD material datasets,the experimental results show that the overall identification accuracy of MaterialNet is 82.3%,which is 7.2% and 4.5% higher than the current mainstream B-CNN and CNN+FV material identification methods,respectively.The recognition accuracy of MaterialNet is high for a variety of materials,and it has the advantages of less parameters and less calculation.

Key words: Atrous convolution, Attention mechanism, Deep convolutional neural network, Spatial pyramid pooling

中图分类号: 

  • TP391
[1]LIU L,ZHAO L J,GUO C Y.Texture Classification:State-of-the-art Methods and Prospects[J].Acta Automatica Sinica,2018,44(4):584-60.
[2]BELL S,UPCHURCH P,SNAVELY N,et al.Material Recognition in the Wild with the Materials in Context Database[C]//The 2015 IEEE Conference on Computer Vision and pattern Recognition(CVPR).Boston,MA,USA,2015(1):3479-3487.
[3]CIMPOI M,MAJI S,KOKKINOS I,et al.Describing texturesin the wild[C]//The 2014 IEEE Conference on Computer Vision and pattern Recognition(CVPR).Columbus,OH,USA:IEEE,2014:3606-3613.
[4]DENG R,LIN J C,YANG H Z.Building Identification Based on Deep Learning[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2019,36(4):17-22.
[5]YANG W G,HUAI Y T.Flower Image Enhancement and Classification Based on Deep Convolution Generative Adversarial Network[J].Computer Science,2020,47(6):176-179.
[6]CIMPOI M,MAJI S,VEDALDI A.Deep filter banks for texture recognition and segmentation[C]//The 2015 IEEE Conference on Computer Vision and pattern Recognition(CVPR).Boston,Massachusetts,USA:IEEE,2015:3828-3836.
[7]LIU L,CHEN J,PIEGUTH P,et al.From BoW to CNN:Two decades of texture representation for texture classification[C]//Preceedings of International Journal of Computer Vision.2019(127):74-109.
[8]SHARAN L,ROSENHOLTZ R,ADELSON E H.Accuracy and speed of material categorization in real-world images[J].Journal of Vision,2014,14(9):1-24.
[9]BU X Y,WU Y W,GAO Z,et al.Deep convolutional network with locality and sparsity constrains for texture classification[J].Pattern Recogition,2019(91):34-46.
[10]LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNNmodels for fine-grained visual recognition[C]//The 2015 IEEE International Conference on Computer Vision(ICCV).Santiago,Chile:IEEE,2015:1449-1457.
[11]XU K,BA J,KIROS R,et al.Show,Attend and Tell:NeuralImage Caption Generation with Visual Attention[C]//Internatio-nal Conference on Machine Learning(ICML).PMLR,2015:2048-2057.
[12]LIU Y,JIN Z.Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism[J].Computer Science,2021,48(1):197-203.
[13]HU J,LI S,GANG S.Squeeze-and-Excitation Networks[C]//The 2018 IEEE Conference on Computer Vision and pattern Recognition(CVPR).Salt Lake City,UT,USA,2018:7132-7141.
[14]BA J,MNIH V,KAVUKCUOGLU K.Multiple Object Recognition with Visual Attention[EB/OL].https://arxiv.org/pdf/1412.7755.pdf.
[15]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//The International Conference of Computer Vision and Pattern Recognition (CVPR).2016:770-778.
[16]LI X,WANG W,HU X,et al.Selective kernel networks[C]//The international Conference of Computer Vision and Pattern Recognition (CVPR).2019:510-519.
[17]CHEN L,PAPANDREOU G,SCHROFF F,et al.Rethinking Atrous Convolution for Semantic Image Segmentation[EB/OL].https://arxiv.org/pdf/1706.05587.pdf.
[18]SHANRAN L,LIU C,ROSENHOLTZ R,et al.Recognizing materials using perceptually inspired features[J].International Journal of Computer Vision,2013,103(3):348-371.
[19]HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutionl networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intellegence,2015,37(9):1904-1916.
[20]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,40(4):838-848.
[21]ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Learning deep features for discrimination localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2921-2929.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[4] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[8] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[13] 彭双, 伍江江, 陈浩, 杜春, 李军.
基于注意力神经网络的对地观测卫星星上自主任务规划方法
Satellite Onboard Observation Task Planning Based on Attention Neural Network
计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093
[14] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[15] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!