计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220100111-10.doi: 10.11896/jsjkx.220100111

• 人工智能 • 上一篇    下一篇

联合语义分割和深度估计的多任务学习研究

罗会兰, 叶桔   

  1. 江西理工大学信息工程学院 江西 赣州 341000
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 罗会兰(825667187@qq.com)
  • 基金资助:
    国家自然科学基金(61862031,61462035);省级学位与研究生教育教学改革研究项目重点项目(JXYJG-2020-120);江西省教育厅科学技术研究项目(GJJ200859,GJJ200884)

Study of Multi-task Learning with Joint Semantic Segmentation and Depth Estimation

LUO Huilan, YE Ju   

  1. School of Information Engineering,Jiangxi University of Science and Technology,Ganzhou,Jiangxi 341000,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:LUO Huilan,born in 1974,Ph.D,professor.Her main research interests include machine learning and pattern re-cognition,multi-task learning,etc.
  • Supported by:
    National Natural Science Foundation of China(61862031,61462035),Education Teaching Reform Research Project of Jiangxi Province(JXYJG-2020-120) and Science and Technology Research Project of Jiangxi Provincial Education Department(GJJ200859,GJJ200884).

摘要: 语义分割和深度估计任务是对图像像素级分类的研究,是两个高度相关的任务。从共享特征学习和特征交互融合两个角度出发,提出两个不同的多任务学习架构,即基于压缩激励模块(Squeeze-and-Excitation,SE)和金字塔池化的多任务学习网络(Multi-task Learning with SE and Pyramid Pooling,MTL_SPP),以及基于压缩激励和可选择权重(Selective Weight,SW)的多任务学习网络(Multi-task Learning with SE and Selective Weights,MTL_SSW),来联合学习语义分割和深度估计。MTL_SPP架构由共享骨干特征网络和任务特定的子网络组成,利用SE模块构建任务特定子网络,并利用金字塔池化增强特征提取。MTL_SSW在MTL_SPP的基础上,让任务特定子网络的语义分割特征和深度估计特征通过SW模块进行相互指导和优化,学习对特定任务更具判别性的特征。实验结果表明,提出的两种方法在NYUD_v2和SUNRGBD两个数据集上获得了优于先进方法的效果。

关键词: 多任务学习, 语义分割, 深度估计, 压缩激励, 可选择权重

Abstract: Semantic segmentation and depth estimation are two highly related tasks of image pixel-level classification.This paper proposes two different multi-task learning architectures from the perspectives of both shared feature extraction and feature interaction fusion:multi-task learning with SE and pyramid pooling (MTL_SPP) based on the squeeze and excitation (SE) and pyramid pooling,and multi-task learning network (MTL_SSW) based on se and selective weights (SW) to jointly learn semantic segmentation and depth estimation.The MTL_SPP architecture consists of shared backbone feature network and task-specific sub-networks,using the SE module to construct task-specific sub-networks and pyramid pooling to enhance feature extraction.Based on MTL_SPP,MTL_SSW adds SW modules which allows the semantic segmentation features and depth estimation features from task-specific sub-networks to guide and optimize each other, o it can learn more discriminative features.Experimental results show that the two proposed methods obtain better results than the state-of-the-art methods on both NYUD_v2 and SUNRGBD datasets.

Key words: Multi-task learning, Semantic segmentation, Depth estimation, Squeeze and excitation, Selective weights啊啊啊

中图分类号: 

  • TP391
[1]ZHANG Y,LIU J W,ZUO X.Survey of Multi-Task Learning[J].Chinese Journal of Computers,2020,43(7):1340-1378.
[2]LIU S,JOHNS E,DAVISON A J.End-to-End Multi-TaskLearning with Attention[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:1871-1880.
[3]GAO Y,MA J,ZHAO M,et al.NDDR-CNN:Layerwise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:3205-3214.
[4]GAO Y,BAI H,JIE Z,et al.MTL-NAS:Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2020:11540-11549.
[5]WANG P,SHEN X,LIN Z,et al.Towards unified depth and semantic prediction from a single image[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2015:2800-2809.
[6]XU D,OUYANG W,WANG X,et al.PAD-Net:Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2018:675-684.
[7]VANDENHENDE S,GEORGOULIS S,GOOL L V.MTI-Net:Multi-Scale Task Interaction Networks for Multi-Task Learning[C]//European Conference on Computer Vision.Piscataway:Springer,2020:527-543.
[8]ZHOU D,FANG J,SONG X,et al.Joint 3D Instance Segmentation and Object Detection for Autonomous Driving[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2020:1836-1846.
[9]ZHANG Z,FIDLER S,URTASUN R.Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:669-677.
[10]ZHANG H,LUO G,TIAN Y,et al.A Virtual-Real Interaction Approach to Object Instance Segmentation in Traffic Scenes[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(2):863-875.
[11]KENDALL A,GAL Y,CIPOLLA R.Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2018:7482-7491.
[12]LI X,WANG W,HU X,et al.Selective Kernel Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:510-519.
[13]EIGEN D,FERGUS R.Predicting Depth,Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE,2015:2650-2658.
[14]EIGEN D,PUHRSCH C,FERGUS R.Depth Map Predictionfrom a Single Image using a Multi-Scale Deep Network[C]//NIPS.2014:2366-2374.
[15]KLINGNER M,TERMÖHLEN J A,MIKOLAJCZYK J,et al.Self-supervised Monocular Depth Estimation:Solving the Dynamic Object Problem by Semantic Guidance[C]//ECCV.2020:582-600.
[16]YIN W,LIU Y,SHEN C,et al.Enforcing Geometric Constraints of Virtual Normal for Depth Prediction[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE,2019:5683-5692.
[17]FU H,GONG M,WANG C,et al.Deep Ordinal Regression Network for Monocular Depth Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
[18]LIU J P,WEN J W,LIANG Y L.Monocular Image Depth Estimation Based on Multi-Scale Attention Oriented Network[J].Journal of South China University of Technology(Natural Science Edition),2020,48(12):52-62.
[19]LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[20]CHEN L-C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848.
[21]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[C]//MICCAI.Springer,2015,9351:234-241.
[22]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//ECCV.2018:833-851.
[23]LIU W,RABINOVICH A,BERG A C.ParseNet:Looking Wider to See Better[C]//ICIL 2016.2016.
[24]ZHEN M,WANG J,ZHOU L,et al.Learning Fully Dense Neural Networks for Image Semantic Segmentation[C]//AAAI Conference on Artificial Intelligence.Palo Alto:AAAI Press,2019:9283-9290.
[25]CHEN L-C,PAPANDREOU G,SCHROFF F,et al.Rethinking Atrous Convolution for Semantic Image Segmentation[J].ar-Xiv:1706.05587v3,2017.
[26]ZHAO H,SHI J,QI X,et al.Pyramid Scene Parsing Network[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2017:6230-6239.
[27]REN T C,HUANG X S,DING W L,et al.Global Bilateral Segmentation Network for Segmantic Segmentation[J].Computer Science,2020,47(S01):5.
[28]YU C,WANG J,PENG C,et al.BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation[C]//European Conference on Computer Vision.2018.
[29]ZHAO H,ZHANG Y,LIU S,et al.PSANet:Point-wise Spatial Attention Network for Scene Parsing[C]//European Confe-rence on Computer Vision.Piscataway:Springer,2018,11213:270-286.
[30]YANG J,DANG J S.Semantic segmentation of 3D point cloud based on contextual attention CNN[J].Journal on Communications,2020,41(7):195-203.
[31]DUAN L J,SUN Q C,QIAO Y H.Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation[J].Chinese Journal of Computers,2021,44(2):275-291.
[32]SEICHTER D,KÖHLER M,LEWANDOWSKI B,et al.Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis[J].arXiv:2011.06961v3,2020.
[33]ZHANG Z,LUO P,LOY C C,et al.Facial Landmark Detection by Deep Multi-task Learning[C]//European Conference on Computer Vision.Piscataway:Springer,2014,8694:94-108.
[34]RANJAN R,PATEL V M,CHELLAPPA R.HyperFace:ADeep Multi-task Learning Framework for Face Detection,Landmark Localization,Pose Estimation,and Gender Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2019,41(1):121-135.
[35]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer So-ciety,2017:1131-1140.
[36]MISRA I,SHRIVASTAVA A,GUPTA A,et al.Cross-stitchNetworks for Multi-task Learning[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:3994-4003.
[37]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE 2017:2980-2988.
[38]REN S,HE K,GIRSHICK R B,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[39]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:770-778.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!