计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230200010-6.doi: 10.11896/jsjkx.230200010

• 图像处理&多媒体技术 • 上一篇    下一篇

基于混合注意力的实时图像语义分割算法

王燕, 夏创帅, 汪娜, 南佩奇   

  1. 兰州理工大学计算机与通信学院 兰州 730050
  • 发布日期:2023-11-09
  • 通讯作者: 夏创帅(xiachuangshuai@163.com)
  • 作者简介:(wangyan@lut.edu.cn)
  • 基金资助:
    国家自然科学基金(61863025)

Real-time Image Semantic Segmentation Algorithm Based on Hybrid Attention

WANG Yan, XIA Chuangshuai, WANG Na, NAN Peiqi   

  1. School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730050,China
  • Published:2023-11-09
  • About author:WANG Yan,born in 1971,master,professor,is a member of China Computer Federation.Her main research interests include pattern recognition and artificial intelligence.
    XIA Chuangshuai,born in 1998,master.His main research interests include pattern recognition and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61863025).

摘要: 针对现有语义分割算法因模型复杂、计算量庞大,导致算法较难部署在移动设备的问题,提出了一种基于混合注意力的实时图像语义分割算法。该算法是非对称的编码器解码器结构,编码器部分结合深度可分离卷积与扩张卷积设计出一个高效残差单元来提取不同网络深度的图像特征,在浅层较多关注空间位置信息,在深层增强语义信息提取。解码器部分设计了混合注意力特征融合模块,使用空间注意力强化浅层的空间位置信息,使用通道注意力增强深层特征图中关键信息的表达能力,能够有效融合不同层次特征图中空间信息与上下文信息,强化语义信息的表达,减小融合过程中图像信息的损失,最后使用分类器得到分割预测图。大量实验的结果表明,该算法在Cityscapes数据集上PA和mIoU分别达到了93.2%和73.2%,在TeslaV100图像计算显卡上以1.62×106的参数量达到38FPS,在Pascal VOC 2012数据集上PA和mIoU达到了92.4%和74.8%。实验结果表明,该算法能够有效且实时地完成城市场景图片分割任务。

关键词: 深度学习, 语义分割, 实时, 特征融合, 注意力机制

Abstract: The existing semantic segmentation algorithms are difficult to deploy on mobile devices due to the complex model and a large amount of computation.A new semantic segmentation algorithm based on hybrid attention is proposed.This algorithm is an asymmetric encoder-decoder structure.The encoder part combines depth-wise separable convolution anddilated convolution to design an efficient residual module to extract image features at different levels of the network.It pays more attention to spatial position information in the shallow layer and enhances semantic information extraction in the deep layer.In the decoder part,a hybrid attention feature fusion module is designed,which uses spatial attention to strengthen the spatial location information in the shallow layer and channel attention to enhance the expression ability of key information in the deep feature map.It can effectively integrate the spatial information and context information in the feature map of different levels,strengthen the expression of semantic information,and reduce the loss of image information in the fusion process.Finally,the segmentation results are predicted by using the classifier.A large number of experiments show that the proposed algorithm achieves 93.2% PA and 73.2% mIoU in Cityscapes,respectively,and achieves 38FPS with 1.62×106 reference on Tesla V100 GPU.In Pascal VOC 2012 data set,PA and mIoU reaches 92.4% and 74.8% respectively.Experimental results show that this algorithm can effectively and quickly complete the task of city scene image segmentation.

Key words: Deep learning, Semantic segmentation, Real-time, Feature fusion, Attention mechanism

中图分类号: 

  • TP391
[1]ASGARI TAGHANAKI S,ABHISHEK K,COHEN J P,et al.Deep semantic segmentation of natural and medical images:a review[J].Artificial Intelligence Review,2021,54:137-178.
[2]HE X,ZHOU Y,ZHAO J,et al.Swin transformer embeddingUNet for remote sensing image semantic segmentation[J].IEEE Transactions on Geoscience and Remote Sensing,2022,60:1-15.
[3]RIZZOLI G,BARBATO F,ZANUTTIGH P.Multimodal Se-mantic Segmentation in Autonomous Driving:A Review of Current Approaches and Future Perspectives[J].Technologies,2022,10(4):90.
[4]CAO X,GAO S,CHEN L,et al.Ship recognition method combined with image segmentation and deep learning feature extraction in video surveillance[J].Multimedia Tools and Applications,2020,79(13):9177-9192.
[5]MA J W,LEITE F.Performance boosting of conventional deep learning-based semantic segmentation leveraging unsupervised clustering[J].Automation in Construction,2022,136:104167.
[6]LEE M,KIM D,SHIM H.Threshold matters in WSSS:manipulating the activation for the robust and accurate segmentation model against thresholds [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4330-4339.
[7]LIU Y,CHENG M M,FAN D P,et al.Semantic edge detection with diverse deep supervision[J].International Journal of Computer Vision,2022,130(1):179-198.
[8]YU H,YANG Z,TAN L,et al.Methods and datasets on seman-tic segmentation:A review[J].Neurocomputing,2018,304:82-103.
[9]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[10]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Segnet:A deep convolutional encoder-decoder architecture for image segmentation[J].IEEE transactions on pattern analysis and machine intelligence,2017,39(12):2481-2495.
[11]PASZKE A,CHAURASIA A,KIM S,et al.Enet:A deep neural network architecture for real-time semantic segmentation[J].arXiv:1606.02147,2016.
[12]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//InternationalConference on Medical Image Computing and Compu-ter-assisted Intervention.2015:234-241.
[13]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[14]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[15]ZHAO H,QI X,SHEN X,et al.Icnet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the European Conference on Computer Vision.2018:405-420.
[16]YU C,WANG J,PENG C,et al.Bisenet:Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision.2018:325-341.
[17]YU C,GAO C,WANG J,et al.Bisenet v2:Bilateral networkwith guided aggregation for real-time semantic segmentation[J].International Journal of Computer Vision,2021,129(11):3051-3068.
[18]WANG Y,ZHOU Q,LIU J,et al.Lednet:A lightweight encoder-decoder network for real-time semantic segmentation[C]//2019 IEEE International Conference on Image Proces-sing.2019:1860-1864.
[19]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[20]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[21]EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The pascal visual object classes (voc) challenge[J].International Journal of Computer Vision,2010,88(2):303-338.
[22]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[23]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[24]POUDEL R P K,LIWICKI S,CIPOLLA R.Fast-scnn:Fast semantic segmentation network[J].arXiv:1902.04502,2019.
[25]WU Y,JIANG J,HUANG Z,et al.FPANet:Feature pyramid aggregation network for real-time semantic segmentation[J].Applied Intelligence,2022,52(3):3319-3336.
[26]ELHASSAN M A M,HUANG C,YANG C,et al.DSANet:Dilated spatial attention for real-time semantic segmentation in urban street scenes[J].Expert Systems with Applications,2021,183:115090.
[27]ZHUANG M,ZHONG X,GU D,et al.LRDNet:A lightweight and efficient network with refined dual attention decorder for real-time semantic segmentation [J].Neurocomputing,2021,459:349-360.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!