计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 434-440.doi: 10.11896/jsjkx.210900199

• 图像处理&多媒体技术 • 上一篇    下一篇

基于离散小波变换的双域特征融合深度卷积神经网络

孙洁琪1, 李亚峰2, 张文博2, 刘鹏辉2   

  1. 1 宝鸡文理学院数学与信息科学学院 陕西 宝鸡 721013
    2 宝鸡文理学院计算机学院 陕西 宝鸡 721016
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 李亚峰(liyafeng770729@126.com)
  • 作者简介:(sunjieqi1017@163.com)
  • 基金资助:
    国家自然科学基金(61971005);陕西省科技厅工业攻关项目(2022GY-064);宝鸡文理学院研究生创新科研项目(YJSCX21YB09)

Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation

SUN Jie-qi1, LI Ya-feng2, ZHANG Wen-bo2, LIU Peng-hui2   

  1. 1 School of Mathematics and Information Sciences,Baoji University of Arts and Sciences,Baoji,Shaanxi 721013,China
    2 School of Computer,Baoji University of Arts and Sciences,Baoji,Shaanxi 721016,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:SUN Jie-qi,born in 1996,postgraduate.Her main research interests include image processing and pattern recognition.
    LI Ya-feng,born in 1977,Ph.D,professor.His main research interests include texture analysis,pattern recognition and optimization algorithm.
  • Supported by:
    National Natural Science Foundation of China(61971005),Industrial Research Project of Science and Technology Department of Shaanxi Province(2022GY-064) and Postgraduate Innovation Research Project of Baoji University of Arts and Sciences(YJSCX21YB09).

摘要: 池化操作是深度卷积神经网络的重要组成部分,也是深度卷积神经网络成功的关键因素之一。然而,在图像识别过程中,传统直接的池化操作会损失特征信息,影响识别的准确率。针对池化操作的特征信息损失问题,提出了基于离散小波变换的双域特征融合模块,以克服直接使用池化操作的缺点。该模块同时考虑了空域和通道域的双域特征融合,将池化操作嵌入在空域特征融合模块与通道域融合模块之间,有效地抑制了直接使用池化操作带来的特征信息损失。通过替换已有的池化操作,新的双域特征融合模块可以非常容易地嵌入到目前流行的深度神经网络架构中。针对图像分类问题,采用VGG,ResNet以及DenseNet等主流网络架构,在CIFAR-10,CIFAR-100,Mini-Imagenet等数据集上进行了一系列实验。实验结果表明,相比经典网络、流行的基于嵌入注意力机制网络和最新基于小波的深度卷积神经网络,所提方法可以获得更高的分类准确率。

关键词: 池化, 离散小波变换, 深度卷积神经网络, 特征融合, 注意力机制

Abstract: Pooling operation is an essential part of deep convolutional neural networks,and also one of the key factors for the success of deep convolutional neural network.However,in the process of image recognition,the traditional direct pooling operation will lead to the loss of feature information and affect the accuracy of recognition.In this paper,a dual-field feature fusion module based on discrete wavelet transform is proposed to overcome the disadvantage of the direct pooling operation.In this module,the dual-field feature fusion of spatial domain and channel domain is considered,and the pooling operation is embedded between spatial feature fusion module and channel feature fusion module,which effectively suppress the information loss of features caused by pooling directly.By replacing the existing pooling operation,the new dual-field feature fusion module can be easily embedded into the current popular deep neural network architectures.Extensive experimental results on CIFAR-10,CIFAR-100 and Mini-Imagenet datasets by using mainstream network architectures such as VGG,ResNet and DenseNet.The experimental results show that compared with the classical network,the popular network based on embedded attention mechanism or latest wavelet basis model,the proposed method can achieve higher classification accuracy.

Key words: Attention mechanisms, Deep convolutional neural networks, Discrete wavelet transform, Feature fusion, Pooling operation

中图分类号: 

  • TP391
[1] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,25(2):1097-1105.
[2] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towardsreal-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[3] ZHANG K,ZUO W M,GU S H,et al.Learning deep cnn denoiser prior for image restoration[C]//IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,2017:2808-2817.
[4] BOUREAU Y,PONCE J,LECUN Y.A theoretical analysis of feature pooling in visual recognition[C]//Proceedings of the 27th International Conference on Machine Learning.Haifa,Is-rael,2010:111-118.
[5] NIELSEN M.Neural Networks and Deep Learning[M].Determination Press,2015.
[6] LEE C,GALLAGHER P,TU Z.Generalizing pooling functions in CNNs:mixed,gated,and tree[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):863-875.
[7] YU D J,WANG H L,CHEN P Q,et al.Mixed Pooling for Convolutional Neural Networks[C]//International Conference on Rough Sets and Knowledge Technology.2014:364-375.
[8] ZEILER M D,FERGUS R.Stochastic pooling for regularization of deep convolutional neural networks[EB/OL].(2013-01-16).https://arxiv.org/abs/1301.3557.
[9] WILLIAMS T,LI R.Wavelet pooling for convolutional neuralnetworks[C]//Proceedings of the International Conference on Learning Representations.Vancouver,BC,2018:1-12.
[10] HOU Q B,ZHANG L,CHENG M M,et al.Strip Pooling:rethinking spatial pooling for scene parsing[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2020:4002-4011.
[11] SPRINGENBERG J T,DOSOVITSKIY A,BROX T,et al.Striving for simplicity:the All convolutional net[EB/OL].(2014-12-21).https://arxiv.org/abs/1412.6806.
[12] ZHANG R.Making convolutional networks shiftinvariant again[EB/OL].(2019-04-25).https://arxiv.org/abs/1904.11486.
[13] DAUBECHIES I.Ten lectures on wavelets[M].United States:Journal of the Acoustical Society of America,1993.
[14] HUANG H,HE R,SUN Z,et al.Wavelet-srnet:A wavelet-based cnn for multi-scale face super resolution[C]//Proceedings of the IEEE International Conference on Computer Vision.Hono-lulu,HI,2017:1689-1697.
[15] FUJIEDA S,TAKAYAMA K,HACHISUNKA T.Waveletconvolutional neural networks for texture classification[EB/OL].(2017-07-24).https://arxiv.org/abs/1707.07394.
[16] LU H Y,WANG H F,ZHANG Q Q,et al.A dual-tree complex wavelet transform based convolutional neural network for human thyroid medical image segmentation[C]//Proceedings of the IEEE International Conference on Healthcare Informatics.569 Lexington Avenue,NY,2018:191-198.
[17] SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA,2015:1-9.
[18] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2016:2818-2826.
[19] DUAN Y P,LIU F,JIAO L C,et al.Sar Image segmentation based on convolutional wavelet neural network and markov random field[J].Pattern Recognition,2017,64:255-267.
[20] LIU P J,ZHANG H Z,ZHANG K,et al.Multi-level wavelet-cnn for image restoration[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops.Salt Lake City,UT,2018:773-782.
[21] RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[C]//International Conference on Medical Image Computing and Compu-ter-Assisted Intervention.Springer International Publishing,2015.
[22] LI Q F,SHEN L L,GUO S,et al.Wavelet integrated CNNs for noise-robust image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2020:7243-7252.
[23] MALLAT S.A theory for multiresolution signal decomposition:the wavelet representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1989,11(4):674-693.
[24] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2014-09-04).https://arxiv.org/abs/1409.1556.
[25] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,2016:770-778.
[26] HUANG G,LIU Z,WEINBERGER K Q.Densely connectedconvolutional networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Honolulu,HI,2017:2261-2269.
[27] ADAM P,SAM C,FRANCISCO M,et al.Pytorch:An imperative style,high-performance deep learning library[EB/OL].https://arxiv.org/abs/1912.01703.
[28] HU J,SHEN L,SUN G.Squeeze-and-Excitation networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,2018:7132-7141.
[29] ZHANG Q L,YANG Y B.SA-Net:shuffle attention for deep convolutional neural networks[EB/OL].(2021-01-30).https://arxiv.org/abs/2102.00240.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[3] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[6] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[7] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[12] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[13] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[14] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[15] 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强.
基于向量注意力机制GoogLeNet-GMP的行人重识别方法
Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism
计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!