联合语义分割和深度估计的多任务学习研究

doi:10.11896/jsjkx.220100111

Abstract

Abstract: Semantic segmentation and depth estimation are two highly related tasks of image pixel-level classification.This paper proposes two different multi-task learning architectures from the perspectives of both shared feature extraction and feature interaction fusion:multi-task learning with SE and pyramid pooling (MTL_SPP) based on the squeeze and excitation (SE) and pyramid pooling,and multi-task learning network (MTL_SSW) based on se and selective weights (SW) to jointly learn semantic segmentation and depth estimation.The MTL_SPP architecture consists of shared backbone feature network and task-specific sub-networks,using the SE module to construct task-specific sub-networks and pyramid pooling to enhance feature extraction.Based on MTL_SPP,MTL_SSW adds SW modules which allows the semantic segmentation features and depth estimation features from task-specific sub-networks to guide and optimize each other, o it can learn more discriminative features.Experimental results show that the two proposed methods obtain better results than the state-of-the-art methods on both NYUD_v2 and SUNRGBD datasets.

Key words: Multi-task learning, Semantic segmentation, Depth estimation, Squeeze and excitation, Selective weights啊啊啊

CLC Number:

TP391

LUO Huilan, YE Ju. Study of Multi-task Learning with Joint Semantic Segmentation and Depth Estimation[J].Computer Science, 2023, 50(6A): 220100111-10.

References

[1]ZHANG Y,LIU J W,ZUO X.Survey of Multi-Task Learning[J].Chinese Journal of Computers,2020,43(7):1340-1378.
[2]LIU S,JOHNS E,DAVISON A J.End-to-End Multi-TaskLearning with Attention[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:1871-1880.
[3]GAO Y,MA J,ZHAO M,et al.NDDR-CNN:Layerwise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:3205-3214.
[4]GAO Y,BAI H,JIE Z,et al.MTL-NAS:Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2020:11540-11549.
[5]WANG P,SHEN X,LIN Z,et al.Towards unified depth and semantic prediction from a single image[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2015:2800-2809.
[6]XU D,OUYANG W,WANG X,et al.PAD-Net:Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2018:675-684.
[7]VANDENHENDE S,GEORGOULIS S,GOOL L V.MTI-Net:Multi-Scale Task Interaction Networks for Multi-Task Learning[C]//European Conference on Computer Vision.Piscataway:Springer,2020:527-543.
[8]ZHOU D,FANG J,SONG X,et al.Joint 3D Instance Segmentation and Object Detection for Autonomous Driving[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2020:1836-1846.
[9]ZHANG Z,FIDLER S,URTASUN R.Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:669-677.
[10]ZHANG H,LUO G,TIAN Y,et al.A Virtual-Real Interaction Approach to Object Instance Segmentation in Traffic Scenes[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(2):863-875.
[11]KENDALL A,GAL Y,CIPOLLA R.Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2018:7482-7491.
[12]LI X,WANG W,HU X,et al.Selective Kernel Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:510-519.
[13]EIGEN D,FERGUS R.Predicting Depth,Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE,2015:2650-2658.
[14]EIGEN D,PUHRSCH C,FERGUS R.Depth Map Predictionfrom a Single Image using a Multi-Scale Deep Network[C]//NIPS.2014:2366-2374.
[15]KLINGNER M,TERMÖHLEN J A,MIKOLAJCZYK J,et al.Self-supervised Monocular Depth Estimation:Solving the Dynamic Object Problem by Semantic Guidance[C]//ECCV.2020:582-600.
[16]YIN W,LIU Y,SHEN C,et al.Enforcing Geometric Constraints of Virtual Normal for Depth Prediction[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE,2019:5683-5692.
[17]FU H,GONG M,WANG C,et al.Deep Ordinal Regression Network for Monocular Depth Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
[18]LIU J P,WEN J W,LIANG Y L.Monocular Image Depth Estimation Based on Multi-Scale Attention Oriented Network[J].Journal of South China University of Technology(Natural Science Edition),2020,48(12):52-62.
[19]LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[20]CHEN L-C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
[21]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[C]//MICCAI.Springer,2015,9351:234-241.
[22]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//ECCV.2018:833-851.
[23]LIU W,RABINOVICH A,BERG A C.ParseNet:Looking Wider to See Better[C]//ICIL 2016.2016.
[24]ZHEN M,WANG J,ZHOU L,et al.Learning Fully Dense Neural Networks for Image Semantic Segmentation[C]//AAAI Conference on Artificial Intelligence.Palo Alto:AAAI Press,2019:9283-9290.
[25]CHEN L-C,PAPANDREOU G,SCHROFF F,et al.Rethinking Atrous Convolution for Semantic Image Segmentation[J].ar-Xiv:1706.05587v3,2017.
[26]ZHAO H,SHI J,QI X,et al.Pyramid Scene Parsing Network[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2017:6230-6239.
[27]REN T C,HUANG X S,DING W L,et al.Global Bilateral Segmentation Network for Segmantic Segmentation[J].Computer Science,2020,47(S01):5.
[28]YU C,WANG J,PENG C,et al.BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation[C]//European Conference on Computer Vision.2018.
[29]ZHAO H,ZHANG Y,LIU S,et al.PSANet:Point-wise Spatial Attention Network for Scene Parsing[C]//European Confe-rence on Computer Vision.Piscataway:Springer,2018,11213:270-286.
[30]YANG J,DANG J S.Semantic segmentation of 3D point cloud based on contextual attention CNN[J].Journal on Communications,2020,41(7):195-203.
[31]DUAN L J,SUN Q C,QIAO Y H.Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation[J].Chinese Journal of Computers,2021,44(2):275-291.
[32]SEICHTER D,KÖHLER M,LEWANDOWSKI B,et al.Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis[J].arXiv:2011.06961v3,2020.
[33]ZHANG Z,LUO P,LOY C C,et al.Facial Landmark Detection by Deep Multi-task Learning[C]//European Conference on Computer Vision.Piscataway:Springer,2014,8694:94-108.
[34]RANJAN R,PATEL V M,CHELLAPPA R.HyperFace:ADeep Multi-task Learning Framework for Face Detection,Landmark Localization,Pose Estimation,and Gender Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2019,41(1):121-135.
[35]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer So-ciety,2017:1131-1140.
[36]MISRA I,SHRIVASTAVA A,GUPTA A,et al.Cross-stitchNetworks for Multi-task Learning[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:3994-4003.
[37]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE 2017:2980-2988.
[38]REN S,HE K,GIRSHICK R B,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[39]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:770-778.

Related Articles 15

[1]	SUN Kaiwei, LIU Hu, RAN Xue, GUO Hao. Few-shot Segmentation Based on Multi-scale Prototype Hierarchical Matching [J]. Computer Science, 2023, 50(6A): 220300275-7.
[2]	GU Yuhang, HAO Jie, CHEN Bing. Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion [J]. Computer Science, 2023, 50(6A): 220500001-6.
[3]	BAI Zhengyao, FAN Shenglan, LU Qianjie, ZHOU Xue. COVID-19 Instance Segmentation and Classification Network Based on CT Image Semantics [J]. Computer Science, 2023, 50(6A): 220600142-9.
[4]	CHEN Qiaosong, ZHANG Yu, PU Liu, TAN Chongchong, DENG Xin, WANG Jin, SUN Kaiwei, OUYANG Weihua. Multi-path Semantic Segmentation Based on Edge Optimization and Global Modeling [J]. Computer Science, 2023, 50(6A): 220700137-7.
[5]	LI Yang, HAN Ping. Human Parsing Model Combined with Regional Sampling and Inter-class Loss [J]. Computer Science, 2023, 50(4): 103-109.
[6]	ZHEN Tiange, SONG Mingyang, JING Liping. Incorporating Multi-granularity Extractive Features for Keyphrase Generation [J]. Computer Science, 2023, 50(4): 181-187.
[7]	MA Weiqi, YUAN Jiabin, ZHA Keke, FAN Lili. Onboard Rock Detection Algorithm Based on Spiking Neural Network [J]. Computer Science, 2023, 50(1): 98-104.
[8]	CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[9]	DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[10]	HU Fu-yuan, WAN Xin-jun, SHEN Ming-fei, XU Jiang-lang, YAO Rui, TAO Zhong-ben. Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network [J]. Computer Science, 2022, 49(5): 10-24.
[11]	ZHAO Kai, AN Wei-chao, ZHANG Xiao-yu, WANG Bin, ZHANG Shan, XIANG Jie. Intracerebral Hemorrhage Image Segmentation and Classification Based on Multi-taskLearning of Shared Shallow Parameters [J]. Computer Science, 2022, 49(4): 203-208.
[12]	YANG Xiao-yu, YIN Kang-ning, HOU Shao-qi, DU Wen-yi, YIN Guang-qiang. Person Re-identification Based on Feature Location and Fusion [J]. Computer Science, 2022, 49(3): 170-178.
[13]	JIN Yu-jie, CHU Xu, WANG Ya-sha, ZHAO Jun-feng. Variational Domain Adaptation Driven Semantic Segmentation of Urban Scenes [J]. Computer Science, 2022, 49(11): 126-133.
[14]	ZHENG Shun-yuan, HU Liang-xiao, LYU Xiao-qian, SUN Xin, ZHANG Sheng-ping. Edge Guided Self-correction Skin Detection [J]. Computer Science, 2022, 49(11): 141-147.
[15]	WANG Shi-yun, YANG Fan. Remote Sensing Image Semantic Segmentation Method Based on U-Net Feature Fusion Optimization Strategy [J]. Computer Science, 2021, 48(8): 162-168.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Study of Multi-task Learning with Joint Semantic Segmentation and Depth Estimation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0