一种基于对偶学习的场景分割模型

doi:10.11896/jsjkx.230700207

Abstract

Abstract: For complex tasks such as urban scene segmentation,there are problems such as low utilization of feature map space information,inaccurate segmentation boundaries,and excessive network parameters.To solve these problems,DualSeg,a scene segmentation model based on dual learning,is proposed.Firstly,depthwise separable convolution is used to significantly reduce the number of model parameters Secondly,accurate context information is obtained by fusing hollow pyramid pooling and double attention mechanism modules.Finally,dual learning is used to construct a closed-loop feedback network,and the mapping space is constrained by duality,while training the two tasks of “image scene segmentation” and “dual image reconstruction”,it can assist the training of the scene segmentation model,help the model to better perceive the category boundary and improve the recogni-tion ability.Experimental results show that the DualSeg model based on the Xception skeleton network achieves 81.3% mIoU and 95.1% global accuracy on natural scene segmentation dataset PASCAL VOC,respectively,and the mIoU reaches 77.4% on the CityScapes dataset,and the number of model parameters decreases by 18.45%,which verifies the effectiveness of the model.A more effective attention mechanism will be explored in the future to further improve the segmentation accuracy.

Key words: Scene segmentation, Image reconstruction, Dual learning, Attention mechanism, Depthwise separable convolution, Multi-level feature fusion

CLC Number:

TP391

LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning[J].Computer Science, 2024, 51(8): 133-142.

References

[1]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[2]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Segnet:A deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(12):2481-2495.
[3]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic image segmentation with deep convolutional nets and fully connected crfs[J].arXiv:1412.7062,2014.
[4]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[5]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[6]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:801-818.
[7]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[8]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[9]ZHENG S,LU J,ZHAO H,et al.Rethinking semantic segmentation from a sequence-to-sequence perspective with transfor-mers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6881-6890.
[10]XIE E,WANG W,YU Z,et al.SegFormer:Simple and efficient design for semantic segmentation with transformers[J].Advances in Neural Information Processing Systems,2021,34:12077-12090.
[11]CHENG B,MISRA I,SCHWING A G,et al.Masked-attention mask transformer for universal image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:1290-1299.
[12]JAIN J,LI J,CHIU M T,et al.Oneformer:One transformer to rule universal image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:2989-2998.
[13]LUO P,WANG G,LIN L,et al.Deep dual learning for semantic image segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2718-2726.
[14]CHEN J L,PENG Y B,LI N.Single image super-resolution reconstruction network based on dual learning strategy [J].Computer Application Research,2021,38(7):2235-2240.
[15]WANG L,LI D,ZHU Y,et al.Dual super-resolution learningfor semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3774-3783.
[16]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[17]EVERINGHAM M,ESLAMI S M A,VAN G L,et al.The pascal visual object classes challenge:A retrospective[J].International Journal of Computer Vision,2015,111:98-136.
[18]CHOLLET F.Xception:Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1251-1258.
[19]BOYD S P,VANDENBERGHE L.Convex optimization[M].Cambridge University Press,2004.
[20]SUN K,ZHAO Y,JIANG B,et al.High-resolution representations for labeling pixels and regions[J].arXiv:1904.04514,2019.
[21]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[22]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:770-778.
[23]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-cam:Visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:618-626.

Related Articles 15

[1]	XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[2]	PU Bin, LIANG Zhengyou, SUN Yu. Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion [J]. Computer Science, 2024, 51(8): 192-199.
[3]	ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[4]	WANG Chao, TANG Chao, WANG Wenjian, ZHANG Jing. Infrared Human Action Recognition Method Based on Multimodal Attention Network [J]. Computer Science, 2024, 51(8): 232-241.
[5]	ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[6]	CHEN Shanshan, YAO Subin. Study on Recommendation Algorithms Based on Knowledge Graph and Neighbor PerceptionAttention Mechanism [J]. Computer Science, 2024, 51(8): 313-323.
[7]	ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[8]	WANG Qian, HE Lang, WANG Zhanqing, HUANG Kun. Road Extraction Algorithm for Remote Sensing Images Based on Improved DeepLabv3+ [J]. Computer Science, 2024, 51(8): 168-175.
[9]	FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.
[10]	BAI Wenchao, BAI Shuwen, HAN Xixian, ZHAO Yubo. Efficient Query Workload Prediction Algorithm Based on TCN-A [J]. Computer Science, 2024, 51(7): 71-79.
[11]	ZENG Zihui, LI Chaoyang, LIAO Qing. Multivariate Time Series Anomaly Detection Algorithm in Missing Value Scenario [J]. Computer Science, 2024, 51(7): 108-115.
[12]	YANG Zhenzhen, WANG Dongtao, YANG Yongpeng, HUA Renyu. Multi-embedding Fusion Based on top-N Recommendation [J]. Computer Science, 2024, 51(7): 140-145.
[13]	HU Haibo, YANG Dan, NIE Tiezheng, KOU Yue. Graph Contrastive Learning Incorporating Multi-influence and Preference for Social Recommendation [J]. Computer Science, 2024, 51(7): 146-155.
[14]	LI Jiaying, LIANG Yudong, LI Shaoji, ZHANG Kunpeng, ZHANG Chao. Study on Algorithm of Depth Image Super-resolution Guided by High-frequency Information ofColor Images [J]. Computer Science, 2024, 51(7): 197-205.
[15]	LOU Zhengzheng, ZHANG Xin, HU Shizhe, WU Yunpeng. Foggy Weather Object Detection Method Based on YOLOX_s [J]. Computer Science, 2024, 51(7): 206-213.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Scene Segmentation Model Based on Dual Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0