计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211100009-5.doi: 10.11896/jsjkx.211100009

• 图像处理&多媒体技术 • 上一篇    下一篇

基于注意力机制的手写体数字识别

李波燕1, 张勇2, 袁德荣2, 熊堂堂1, 何浪2   

  1. 1 江西财经大学统计学院 南昌 330000
    2 江西财经大学软件与物联网工程学院 南昌 330000
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 张勇(zhangyong@jxufe.edu.cn)
  • 作者简介:(BoyanLiS@qq.com)
  • 基金资助:
    国家自然科学基金(61762043);江西省自然科学基金(20192BAB207022);江西省教育厅科学技术研究重点项目(GJJ190249)

Handwritten Digit Recognition Based on Attention Mechanism

LI Bo-yan1, ZHANG Yong2, YUAN De-rong2, XIONG Tang-tang1, HE Lang2   

  1. 1 School of Statistics,Jiangxi University of Finance and Economics,Nanchang 330000,China
    2 School of Software and Internet of Things Engineering,Jiangxi University of Finance and Economics,Nanchang 330000,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:LI Bo-yan,born in 1999,master.Her main research interests include education statistics and artificial intelligence.
    ZHANG Yong,born in 1975,Ph.D,professor.His main research interests include information security and intelligent systems,quantum computing.
  • Supported by:
    National Natural Science Foundation of China(61762043),Natural Science Foundation of Jiangxi Province,China(20192BAB207022) and Key Science and Technology Research Project of Jiangxi Provincial Department of Education(GJJ190249).

摘要: 作为模式识别的重要分支,手写体数字识别正置于前所未有的热潮之下,卷积神经网络也被广泛应用于相关研究。针对手写体数字识别在训练过程中容易出现梯度爆炸和梯度弥散等现象导致图像识别准确率低的问题,提出了一种嵌入CBAM(Convolutional Block Attention Module)注意力模块的模型,用于手写体数字识别。在卷积神经网络中嵌入CBAM注意力模块,分别从通道和空间维度上筛选出有效特征,抑制无关特征,增强特征的表达能力,提高模型的识别准确率。为进一步提高网络识别准确率,在整个网络架构中充分应用BN(Batch Normalization)算法,加快模型收敛,从而加强模型的抗过拟合能力。在MNIST数据集上进行实验,结果表明,嵌入CBAM注意力模块网络的总体识别准确率达到了99.87%,与一些传统的卷积神经网络模型相比,识别准确率有显著提升。

关键词: 手写体数字识别, 注意力机制, 卷积神经网络, 深度学习

Abstract: As an important branch of pattern recognition,handwritten digit recognition is in an unprecedented upsurge,and con-volutional neural networks are also widely used in related research.In view of the problem that gradient explosion and gradient dispersion are prone to occur in the training process of handwritten digit recognition,which leads to low image recognition accuracy,a model embedded with convolutional block attention module(CBAM)is newly proposed for handwritten digit recognition.The CBAM is embedded in the convolutional neural network in order to screen out effective features from the channel and spatial dimensions respectively,suppress irrelevant features,enhance the expression ability of features,and improve the recognition accuracy of the model.In order to further improve the accuracy of network identification,the batch normalization(BN) algorithm is fully applied in the entire network architecture to speed up the model convergence,in this way,the anti-over-fitting ability of the model gets improved.The results of experiments which are conducted on the MNIST dataset show that the overall recognition accuracy of the embedded CBAM attention module network is up to 99.87%,and compared with some traditional convolutional neural network models,its recognition accuracy is significantly improved.

Key words: Handwritten digit recognition, Attention mechanism, Convolutional neural network, Deep learning

中图分类号: 

  • TP391
[1]CHEN T X.Research on handwritten digit recognition based on integrated convolutional neural network[D].Wuhan:Central China Normal University,2020.
[2]DU X,GAO M F.Application of artificial neural network in number recognition[J].Computer System Applications,2007(2):21-22,27.
[3]LECUN Y,BOSER B,DENKER J S,et al.Hardwritten digit recognition with a back-propagation network[J].Advances in Neural Information Processing Systems,1900,2(2):369-404.
[4]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[6]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Computer Vision and Pattern Recognition.2015:1-9.
[7]HE K,ZHANG X,REN S,et al.Deep residual learning foriamge recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:770-778.
[8]CHOLLET F.Xception:Deep learning with depth-wise separa-ble convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017:1800-1807.
[9]RU X Q,HUA G G,LI L H,et al.Research on handwritten di-git recognition based on deformable convolutional neural network[J].Microelectronics and Computer,2019,36(4):47-51.
[10]MA J Y,MENG X,ZHAO Y.Handwritten digit recognitionbased on spiking neural network[J].Digital Technology and Application,2019,37(5):81-83.
[11]YU S X,XIA C X,TANG Z T,et al.Handwritten digit recognition based on improved inception convolutional neural network [J].Computer Applications and Software,2019,36(12):143-149.
[12]FU Y Z.Research on handwritten digit recognition methodbased on deep learning[D].Yinchuan:Ningxia University,2020.
[13]WOO S,PARK J,LEE J Y,et al.CBAM:convolutional block attention module[C]//European Conference on Computer Vision.Cham:Springer,2018:3-19.
[14]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Learning.PMLR,2015:448-456.
[15]LI H W,WU Q X.Implementation of neural network activation function in smart sensors[J].Sensors and Microsystems,2014,33(1):46-48.
[16]ZHOU F Y,JIN L P,DONG J.Summary of convolutional neural network research[J].Chinese Journal of Computers,2017,40(6):1229-1251.
[17]MAAS A L,HANNUN A Y,NG A Y.Rectifier nonlinea-rities improve neural network acoustic models[C]//Proceedings of the 30th International Conference on Machine Learning.Atlanta:ACM,2013:456-462.
[18]ZHANG H,ZHANG Q,YU J Y.Overview of the development of activation functions and analysis of their properties[J].Journal of Xihua University(Natural Science Edition),2021,40(4):1-10.
[19]NAIR V,HINTON G E.Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning(ICML-10).Haifa,Israel:DBLP,2010:807-814.
[20]ZUBAIR S,YAN F,WANG W W.Dictionary learning basedsparse coefficients for audio classification with max and average pooling[J].Digital Signal Processing,2013,23(3):960-970.
[21]HANG S T,AONO M.Bi-linearly weighted fractional max pooling[J].Multimedia Tools and Applications,2017,76(21):22095-22117.
[22]DIETTERICH T G,BAKIRI G.Solving multiclass learningproblems via error-correcting output codes[J].Joural of Artificial Intelligence Research,1995,2(1):263-286.
[23]HE X Y,XIONG W,LI Y Q,et al.Handwritten digit recognition based on convolutional neural network[J].Electronic Components and Information Technology,2020,4(7):53-54.
[24]LV H.Design of Handwritten digit recognition system based on convolutional neural network[J].Intelligent Computers and Applications,2019,9(2):54-56,62.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[3] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[5] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[7] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[8] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[9] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[10] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[13] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[14] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[15] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!