计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 434-440.doi: 10.11896/jsjkx.210900199
孙洁琪1, 李亚峰2, 张文博2, 刘鹏辉2
SUN Jie-qi1, LI Ya-feng2, ZHANG Wen-bo2, LIU Peng-hui2
摘要: 池化操作是深度卷积神经网络的重要组成部分,也是深度卷积神经网络成功的关键因素之一。然而,在图像识别过程中,传统直接的池化操作会损失特征信息,影响识别的准确率。针对池化操作的特征信息损失问题,提出了基于离散小波变换的双域特征融合模块,以克服直接使用池化操作的缺点。该模块同时考虑了空域和通道域的双域特征融合,将池化操作嵌入在空域特征融合模块与通道域融合模块之间,有效地抑制了直接使用池化操作带来的特征信息损失。通过替换已有的池化操作,新的双域特征融合模块可以非常容易地嵌入到目前流行的深度神经网络架构中。针对图像分类问题,采用VGG,ResNet以及DenseNet等主流网络架构,在CIFAR-10,CIFAR-100,Mini-Imagenet等数据集上进行了一系列实验。实验结果表明,相比经典网络、流行的基于嵌入注意力机制网络和最新基于小波的深度卷积神经网络,所提方法可以获得更高的分类准确率。
中图分类号:
| [1] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,25(2):1097-1105. [2] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towardsreal-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [3] ZHANG K,ZUO W M,GU S H,et al.Learning deep cnn denoiser prior for image restoration[C]//IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,2017:2808-2817. [4] BOUREAU Y,PONCE J,LECUN Y.A theoretical analysis of feature pooling in visual recognition[C]//Proceedings of the 27th International Conference on Machine Learning.Haifa,Is-rael,2010:111-118. [5] NIELSEN M.Neural Networks and Deep Learning[M].Determination Press,2015. [6] LEE C,GALLAGHER P,TU Z.Generalizing pooling functions in CNNs:mixed,gated,and tree[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):863-875. [7] YU D J,WANG H L,CHEN P Q,et al.Mixed Pooling for Convolutional Neural Networks[C]//International Conference on Rough Sets and Knowledge Technology.2014:364-375. [8] ZEILER M D,FERGUS R.Stochastic pooling for regularization of deep convolutional neural networks[EB/OL].(2013-01-16).https://arxiv.org/abs/1301.3557. [9] WILLIAMS T,LI R.Wavelet pooling for convolutional neuralnetworks[C]//Proceedings of the International Conference on Learning Representations.Vancouver,BC,2018:1-12. [10] HOU Q B,ZHANG L,CHENG M M,et al.Strip Pooling:rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2020:4002-4011. [11] SPRINGENBERG J T,DOSOVITSKIY A,BROX T,et al.Striving for simplicity:the All convolutional net[EB/OL].(2014-12-21).https://arxiv.org/abs/1412.6806. [12] ZHANG R.Making convolutional networks shiftinvariant again[EB/OL].(2019-04-25).https://arxiv.org/abs/1904.11486. [13] DAUBECHIES I.Ten lectures on wavelets[M].United States:Journal of the Acoustical Society of America,1993. [14] HUANG H,HE R,SUN Z,et al.Wavelet-srnet:A wavelet-based cnn for multi-scale face super resolution[C]//Proceedings of the IEEE International Conference on Computer Vision.Hono-lulu,HI,2017:1689-1697. [15] FUJIEDA S,TAKAYAMA K,HACHISUNKA T.Waveletconvolutional neural networks for texture classification[EB/OL].(2017-07-24).https://arxiv.org/abs/1707.07394. [16] LU H Y,WANG H F,ZHANG Q Q,et al.A dual-tree complex wavelet transform based convolutional neural network for human thyroid medical image segmentation[C]//Proceedings of the IEEE International Conference on Healthcare Informatics.569 Lexington Avenue,NY,2018:191-198. [17] SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA,2015:1-9. [18] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2016:2818-2826. [19] DUAN Y P,LIU F,JIAO L C,et al.Sar Image segmentation based on convolutional wavelet neural network and markov random field[J].Pattern Recognition,2017,64:255-267. [20] LIU P J,ZHANG H Z,ZHANG K,et al.Multi-level wavelet-cnn for image restoration[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops.Salt Lake City,UT,2018:773-782. [21] RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[C]//International Conference on Medical Image Computing and Compu-ter-Assisted Intervention.Springer International Publishing,2015. [22] LI Q F,SHEN L L,GUO S,et al.Wavelet integrated CNNs for noise-robust image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2020:7243-7252. [23] MALLAT S.A theory for multiresolution signal decomposition:the wavelet representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1989,11(4):674-693. [24] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2014-09-04).https://arxiv.org/abs/1409.1556. [25] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2016:770-778. [26] HUANG G,LIU Z,WEINBERGER K Q.Densely connectedconvolutional networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Honolulu,HI,2017:2261-2269. [27] ADAM P,SAM C,FRANCISCO M,et al.Pytorch:An imperative style,high-performance deep learning library[EB/OL].https://arxiv.org/abs/1912.01703. [28] HU J,SHEN L,SUN G.Squeeze-and-Excitation networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,2018:7132-7141. [29] ZHANG Q L,YANG Y B.SA-Net:shuffle attention for deep convolutional neural networks[EB/OL].(2021-01-30).https://arxiv.org/abs/2102.00240. | 
| [1] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 | 
| [2] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 | 
| [3] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 | 
| [4] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 | 
| [5] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 | 
| [6] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 | 
| [7] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 | 
| [8] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 | 
| [9] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 | 
| [10] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 | 
| [11] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 | 
| [12] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 | 
| [13] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 | 
| [14] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 | 
| [15] | 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强. 基于向量注意力机制GoogLeNet-GMP的行人重识别方法 Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism 计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198 | 
| 
 | ||