计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 133-142.doi: 10.11896/jsjkx.230700207
刘思纯, 王小平, 裴喜龙, 罗航宇
LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu
摘要: 城市场景分割等复杂任务存在特征图空间信息利用率低下、分割边界不够精准以及网络参数量过大的问题。为解决这些问题,提出了一种基于对偶学习的场景分割模型DualSeg。首先,采用深度可分离卷积使模型参数量显著减少;其次,融合空洞金字塔池化与双重注意力机制模块获取准确的上下文信息;最后,利用对偶学习构建闭环反馈网络,通过对偶关系约束映射空间,同时训练“图像场景分割”和“对偶图像重建”两个任务,辅助场景分割模型的训练,帮助模型更好地感知类别边界、提高识别能力。实验结果表明,在自然场景分割数据集PASCAL VOC中,基于Xception骨架网络的DualSeg模型的mIoU和全局准确率分别达到81.3%和95.1%,在CityScapes数据集上mIoU达到77.4%,并且模型参数量减少18.45%,验证了模型的有效性。后续将探索更有效的注意力机制,进一步提高分割精度。
中图分类号:
[1]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440. [2]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Segnet:A deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495. [3]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic image segmentation with deep convolutional nets and fully connected crfs[J].arXiv:1412.7062,2014. [4]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848. [5]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017. [6]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:801-818. [7]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [8]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [9]ZHENG S,LU J,ZHAO H,et al.Rethinking semantic segmentation from a sequence-to-sequence perspective with transfor-mers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6881-6890. [10]XIE E,WANG W,YU Z,et al.SegFormer:Simple and efficient design for semantic segmentation with transformers[J].Advances in Neural Information Processing Systems,2021,34:12077-12090. [11]CHENG B,MISRA I,SCHWING A G,et al.Masked-attention mask transformer for universal image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:1290-1299. [12]JAIN J,LI J,CHIU M T,et al.Oneformer:One transformer to rule universal image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:2989-2998. [13]LUO P,WANG G,LIN L,et al.Deep dual learning for semantic image segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2718-2726. [14]CHEN J L,PENG Y B,LI N.Single image super-resolution reconstruction network based on dual learning strategy [J].Computer Application Research,2021,38(7):2235-2240. [15]WANG L,LI D,ZHU Y,et al.Dual super-resolution learningfor semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3774-3783. [16]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223. [17]EVERINGHAM M,ESLAMI S M A,VAN G L,et al.The pascal visual object classes challenge:A retrospective[J].International Journal of Computer Vision,2015,111:98-136. [18]CHOLLET F.Xception:Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1251-1258. [19]BOYD S P,VANDENBERGHE L.Convex optimization[M].Cambridge University Press,2004. [20]SUN K,ZHAO Y,JIANG B,et al.High-resolution representations for labeling pixels and regions[J].arXiv:1904.04514,2019. [21]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890. [22]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:770-778. [23]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-cam:Visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:618-626. |
|