计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230900153-7.doi: 10.11896/jsjkx.230900153

• 图像处理&多媒体技术 • 上一篇    下一篇

融合注意力机制的DeeplabV3+服装图像分割方法

肖雅慧1, 张自力1,2, 胡新荣1,2, 彭涛1,2, 张俊3   

  1. 1 武汉纺织大学计算机与人工智能学院 武汉 430200
    2 湖北省服装信息化工程技术研究中心 武汉 430200
    3 武汉工程大学计算机科学与工程学院 武汉 430205
  • 发布日期:2024-06-06
  • 通讯作者: 张自力(zlzhang@wtu.edu.cn)
  • 作者简介:(2115363055@mail.wtu.edu.cn)
  • 基金资助:
    湖北省教育厅科学技术研究计划项目(B2017066)

Clothing Image Segmentation Method Based on Deeplabv3+ Fused with Attention Mechanism

XIAO Yahui1, ZHANG Zili1,2, HU Xinrong1,2, PENG Tao1,2, ZHANG Jun3   

  1. 1 School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan 430200,China
    2 Engineering Research Center of Hubei Province for Clothing Information,Wuhan 430200,China
    3 School of Computer Science and Engineering,Wuhan Institute of Technology,Wuhan 430205,China
  • Published:2024-06-06
  • About author:XIAO Yahui,born in 1999,postgra-duate,is a member of CCF(No.Q0221G).Her main research interests include machine learning and image processing.
    ZHANG Zili,born in 1981,Ph.D,lecturer.His main research interests include machine learning and image processing.
  • Supported by:
    Science and Technology Research Project of Education Department of Hubei Province(B2017066).

摘要: 针对在服装图像语义分割中存在由服装颜色、纹理、背景以及多目标遮挡导致的边缘分割粗糙和分割精度低等问题,文中基于Deeplabv3+框架,提出了一种图像语义分割算法(FFDNet)。首先,模型的骨干网络采用ResNet101网络,并添加通道空间注意力模块(Feature-Enhanced Attention Module,FEAM),通过对特征图加权来挖掘并增强特征信息,提高网络表达能力。其次引入特征对齐模块(Feature Align Module,FAM)作为一种新的上采样方式,解决不同尺度特征融合之间特征未对齐导致分割错误且效率低的问题,以此提高对服装图像分割的准确性和鲁棒性。最后,FFDNet在Deepfashion2和PASCAL VOC 2012数据集上的平均交并比分别达到55.2%和79.4%;在参数量方面,该模型相比原模型在Deepfashion2上仅增加了0.61MB。与其他现有经典模型对比,其分割性能更优,能有效捕获图像局部细节信息,减少像素分类错误。

关键词: 服装图像, 语义分割, 注意力机制, Deeplabv3+网络, 特征对齐

Abstract: Aiming at the problems of rough edge segmentation and low segmentation accuracy caused by color,texture,background and multi-object occlusion in clothing image segmentation,an image semantic segmentation method(FFDNet) based on Deeplabv3+ with attention mechanism is proposed.Firstly,the backbone network of the model uses the ResNet101 network.The feature-enhanced attention module(FEAM) is added at the end of it.The feature map is weighted from the two dimensions of channel and spatial to mine and enhance the feature information and optimize the segmentation edge to improve network clarity.Secondly,a feature align module(FAM) is introduced as a novel upsampling method to address the problem of segmentation errors and low efficiency caused by misalignment between features during the fusion of different scale features,so as to to improve the accuracy and robustness of clothing image segmentation.Finally,the mean intersection over union of the proposed method reaches 55.2% and 79.4% on Deepfashion2 and PASCAL VOC2012,respectively.In terms of parameter size,the model only increases by 0.61MB compared to the original model on Deepfashion2.The segmentation performance of the FFDNet is superior to the existing state-of-the-art network models,which can effectively capture image local detail information and reduce pixel classification errors.

Key words: Clothing image, Semantic segmentation, Attention mechanism, Deeplabv3+ network, Feature alignment

中图分类号: 

  • TP391
[1]HAN D Y,QIN X,LIU B S.Colored clothing image segmentation based on watershed region merging[J].Journal of Guangxi Normal University(Natural Science Edition),2015,32(4):75-81.
[2]GAO Y P,SONG D,WANG Y J,et al.A modified K-means clustering clothing image segmentation algorithm[J].Journal of Hunan College of Engineering(Natural Science Edition),2021,31(2):54-59.
[3]LONG J,SHELHAMER E,DARRELL T.Fullyconvolutionalnetworks for semanticsegmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[4]RONNEBERGER O,FISCHER P,BROX T.Unet:Convolu-tional networks for biomedical image segmentation[C]//18th International Conference Medical Image Computing and Computer Assisted Intervention(MICCAI 2015).Munich,Germany,Part III 18.Springer International Publishing,2015:234-241.
[5]ZHAO H,SHI J,QI X,et al.Pyramid scene par-sing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[6]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic image segmentation with deep convolutional nets and fully connected crfs[J].arXiv:1412.7062,2014.
[7]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[8]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[9]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder decoder with atrous separable convolution for semantic imagesegmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:801-818.
[10]YANG S,XU Z B,CHEN C.Automatic clothing image segmentation based on Faster R-CNN and Grabcut algorithm[J].Intelligent Computers and Applications,2020,10(7):306-310.
[11]HUANG Z,WANG X,HUANG L,et al.Ccnet:Criss cross at-tention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:603-612.
[12]LI X,ZHONG Z,WU J,et al.Expectation maximization attention networks for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9167-9176.
[13]ZHONG Z,LIN Z Q,BIDARTR,et al.Squeeze and attention networks for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:13065-13074.
[14]SONG Q,LI J,LI C,et al.Fully attentional network for semantic segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(2):2280-2288.
[15]HU J,SHEN L,SUNG.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[16]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedingsof the European Conference on Computer Vision(ECCV).2018:3-19.
[17]SONG Q,MEI K,HUANG R.AttaNet:Attention augmentednetwork for fast and accurate scene parsing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:2567-2575.
[18]ZHANG C,LIU X P,YUAN H,et al.Milling machine clastic image segmentation method based on improved DeepLabV3+[J].Journal of Optoelectronics,Lasers,2023(5):489-497.
[19]GUO J,XIN Y L,XIE Q Q.Improved building segmentation in remote sensing image based on DeepLabV3+[J].Laser Journal,2023:1-10.
[20]GU M H,LIU J,LI L Y,et al.Combining with the characteristics of learning and attention mechanism of clothing image segmentation[J].Journal of Textile,Lancet,2022(11):163-171.
[21]ZHAO Y,HE J.Clothing image segmentation using Deeplabv3+algorithm with dual attention mechanism[J].Journal of Chengdu University of Information Engineering,2022(1):67-71.
[22]DENG F,FENG H,LIANG M,et al.FEANet:Feature en-hanced attention network for RGB thermal real time semantic segmentation[C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2021:4467-4473.
[23]XU J,XIONG Z,BHATTACHARYYA S P.PIDNet:A RealTime Semantic Segmentation Network Inspired by PID Controllers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:19529-19539.
[24]LI X,YOU A,ZHU Z,et al.Semantic flow for fast and accurate scene parsing[C]//European Conference on Computer Vision.Cham:Springer,2020:775-793.
[25]HUANG Z,WEI Y,WANG X,et al.Alignseg:Feature aligned segmentation networks[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2021,44(1):550-557.
[26]HUANG S,LU Z,CHENG R,et al.FaPN:Feature-aligned py-ramid network for dense image prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:864-873.
[27]GE Y,ZHANG R,WANG X,et al.Deepfashion2:A versatile benchmark for detection,pose estimation,segmentation and reidentification of clothing images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5337-5345.
[28]CAO H,WANG Y,CHEN J,et al.Swin unet:Unet like puretransformer for medical image segmentation[J].arXiv:2105.05537,2021.
[29]WU H K,ZHANG J G,HUANG K Q,et al.FastFCN:rethinking dilated convolution in the backbone for semantic segmentation[J].arXiv:1903.11816,2001.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!