计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230300204-8.doi: 10.11896/jsjkx.230300204

• 图像处理&多媒体技术 • 上一篇    下一篇

基于注意力机制与密集邻域预测的轻量化图像语义分割

王国刚, 董志豪   

  1. 山西大学物理电子工程学院 太原 030006
  • 发布日期:2024-06-06
  • 通讯作者: 王国刚(kingguogang@sxu.edu.cn)
  • 基金资助:
    国家自然科学基金(11804209);山西省自然科学基金(201901D111031,201901D211173)

Lightweight Image Semantic Segmentation Based on Attention Mechanism and Densely AdjacentPrediction

WANG Guogang, DONG Zhihao   

  1. College of Physics and Electronic Engineering,Shanxi University,Taiyuan 030006,China
  • Published:2024-06-06
  • About author:WANG Guogang,born in 1977,Ph.D,associate professor,is a member of CCF(No.K7194M).His main research interests include the image processing and computer vision,and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(11804209) and Natural Science Foundation of Shanxi Province,China(201901D111031,201901D211173).

摘要: DeepLabv3+计算复杂度高,空洞空间金字塔池化模块难以突出重要通道特征,解码器生成的高语义化特征图缺乏足够的细节信息。针对上述问题,提出一种基于注意力机制与密集邻域预测的轻量化图像语义分割模型。该模型把MobileNet V2作为主干网络,减少了模型参数量;利用通道空洞空间金字塔池化提取多尺度信息,并对特征图的各通道加权,强化重要通道特征的学习;采用密集邻域预测融合高级特征与低级特征,细化分割结果。在PASCAL VOC 2012增强数据集上进行实验,结果表明,所提方法的平均交并比和平均像素精确度均高于其他7种主流对比算法。与DeepLabv3+相比,参数量与计算量分别减少184.82×106和90.83GFLOPs,该算法在提升分割精度的同时减少了计算开销。

关键词: 深度学习, 语义分割, DeepLabv3+, 注意力机制

Abstract: A novel algorithm named as lightweight image semantic segmentation based on attention mechanism and densely adjacent prediction is proposed to avoid the disadvantages of the difficulty in highlighting important channel features for atrous spatial pyramid pooling module,higher computational complexity and lacking of sufficient detailed information for the high level semantic feature map generated by the decoder in DeepLabv3+ algorithm.The lightweight MobileNetV2 is regarded as the backbone network to reduce model parameters.After the multi-scale information is extracted by the channel atrous spatial pyramid pooling,each channel of the feature map is weighted to reinforce the learning of important channel features.Moreover,the segmentation results are refined since densely adjacent prediction is utilized to combine high-level and low-level features.Experiments are performed on the PASCAL VOC 2012 augmented dataset,and the experimental results show that both mean Intersection over union and mean pixel accuracy of the proposed method are higher than the state-of-the-art algorithms.Compared with DeepLabv3+,the parameters and calculation amount are decreased by 184.82×106 and 90.83GFLOPs respectively.The proposed algorithm not only improves the segmentation accuracy,but also reduces the computation cost compared to the baseline algorithm.

Key words: Deep learning, Semantic segmentation, DeepLabv3+, Attention mechanism

中图分类号: 

  • TP391
[1]CAI Y F,DAI L,WANG H,et al.Multi-Target Pan-Class In-trinsic Relevance Driven Model for Improving Semantic Segmentation in Autonomous Driving[J].IEEE Transactions on Image Processing,2021,30:9069-9084.
[2]ZHOU W,BERRIO J S,WORRAL S,et al.Automated Evalua-tion of Semantic Segmentation Robustness for Autonomous Driving[J].IEEE Transactions on Intelligent Transportation Systems,2020,21(5):1951-1963.
[3]YI D W,FANG H,HUA Y N,et al.Improving Synthetic to Realistic Semantic Segmentation With Parallel Generative Ensembles for Autonomous Urban Driving[J].IEEE Transactions on Cognitive and Developmental Systems,2022,14(4):1496-1506.
[4]YIN P S,YUAN R,CHENG Y M,et al.Deep Guidance Network for Biomedical Image Segmentation[J].IEEE Access,2020,8:116106-116116.
[5]ZHANG M,LI X,XU M J,et al.Automated Semantic Segmentation of Red Blood Cells for Sickle Cell Disease[J].IEEE Journal of Biomedical and Health Informatics,2020,24(11):3095-3102.
[6]GAO Z J,HE Y,LI Y.A Novel Lightweight Swin-Unet Network for Semantic Segmentation of COVID-19 Lesion in CT Images[J].IEEE Access,2023,11:950-962.
[7]IBRAHIM M,AKHTAR N,WISE M,et al.Annotation Tool and Urban Dataset for 3D Point Cloud Semantic Segmentation[J].IEEE Access,2021,9:35984-35996.
[8]SHI W J,XU J W,ZHU D C,et al.RGB-D Semantic Segmentation and Label-Oriented Voxelgrid Fusion for Accurate 3D Semantic Mapping[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(1):183-197.
[9]YANG J F,ZHOU B C,QIU H D,et al.MLFNet-Point Cloud Semantic Segmentation Convolution Network Based on Multi-Scale Feature Fusion[J].IEEE Access,2021,9:44950-44962.
[10]MINAEE S,BOYKOV Y,PORIKLI F,et al.Image Segmentation Using Deep Learning:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(7):3523-3542.
[11]LONG J,SHELHAMER E,DARRELL T,et al.Fully convolutional networks for semantic segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Boston,MA,USA,2015:3431-3440.
[12]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Segnet:a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[13]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[C]//International Conference on Medical image computing and computer-assisted intervention.Cham:Springer,2015:234-241.
[14]ZHAO H S,SHI J P,QI X J,et al.Pyramid Scene Parsing Network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,HI,USA,2017:6230-6239.
[15]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
[16]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[EB/OL].(2017-12-05)[2023-02-24].https://arxiv.org/abs/1706.05587.
[17]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[EB/OL].(2014-12-22)[2023-02-24].https://arxiv.org/abs/1412.7062.
[18]FU J,LIU J,TIAN H J,et al.Dual Attention Network for SceneSegmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Long Beach,CA,USA,2019:3141-3149.
[19]WANG X L,GIRSHICK R,GUPTA A,et al.Non-local neural networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA,2018:7794-803.
[20]YIN M L,YAO Z H,CAO Y,et al.Disentangled non-local neu-ral networks[C]//Proceedings of the European Conference on Computer Vision.2020:191-207.
[21]HUANG Z L,WANG X G,WEI C C,et al.CCNet:Criss-Cross Attention for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(6):6896-6908.
[22]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:801-818.
[23]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,HI,USA,2017:1800-1807.
[24]SANDLER M,HOWARD A,ZHU M,et al.MobileNetV2:Inverted Residuals and Linear Bottlenecks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA,2018:4510-4520.
[25]ZHANG Z L,ZHANG X Y,PENG C,et al.ExFuse:Enhancing Feature Fusion for Semantic Segmentation[C]//European Conference on Computer Vision.Charm:Springer,2018:273-288.
[26]SHI W,CABALLERO J,HUSZAR F,et al.Real-Time SingleImage and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas,NV,USA,2016:1874-1883.
[27]PENG C,ZHANG X Y,YU G,et al.Large Kernel Matters—Improve Semantic Segmentation by Global ConvolutionalNetwork[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,HI,USA,2017:1743-1751.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!