计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220100111-10.doi: 10.11896/jsjkx.220100111
罗会兰, 叶桔
LUO Huilan, YE Ju
摘要: 语义分割和深度估计任务是对图像像素级分类的研究,是两个高度相关的任务。从共享特征学习和特征交互融合两个角度出发,提出两个不同的多任务学习架构,即基于压缩激励模块(Squeeze-and-Excitation,SE)和金字塔池化的多任务学习网络(Multi-task Learning with SE and Pyramid Pooling,MTL_SPP),以及基于压缩激励和可选择权重(Selective Weight,SW)的多任务学习网络(Multi-task Learning with SE and Selective Weights,MTL_SSW),来联合学习语义分割和深度估计。MTL_SPP架构由共享骨干特征网络和任务特定的子网络组成,利用SE模块构建任务特定子网络,并利用金字塔池化增强特征提取。MTL_SSW在MTL_SPP的基础上,让任务特定子网络的语义分割特征和深度估计特征通过SW模块进行相互指导和优化,学习对特定任务更具判别性的特征。实验结果表明,提出的两种方法在NYUD_v2和SUNRGBD两个数据集上获得了优于先进方法的效果。
中图分类号:
[1]ZHANG Y,LIU J W,ZUO X.Survey of Multi-Task Learning[J].Chinese Journal of Computers,2020,43(7):1340-1378. [2]LIU S,JOHNS E,DAVISON A J.End-to-End Multi-TaskLearning with Attention[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:1871-1880. [3]GAO Y,MA J,ZHAO M,et al.NDDR-CNN:Layerwise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:3205-3214. [4]GAO Y,BAI H,JIE Z,et al.MTL-NAS:Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2020:11540-11549. [5]WANG P,SHEN X,LIN Z,et al.Towards unified depth and semantic prediction from a single image[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2015:2800-2809. [6]XU D,OUYANG W,WANG X,et al.PAD-Net:Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2018:675-684. [7]VANDENHENDE S,GEORGOULIS S,GOOL L V.MTI-Net:Multi-Scale Task Interaction Networks for Multi-Task Learning[C]//European Conference on Computer Vision.Piscataway:Springer,2020:527-543. [8]ZHOU D,FANG J,SONG X,et al.Joint 3D Instance Segmentation and Object Detection for Autonomous Driving[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2020:1836-1846. [9]ZHANG Z,FIDLER S,URTASUN R.Instance-Level Segmentation for Autonomous Driving with Deep Densely Connected MRFs[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:669-677. [10]ZHANG H,LUO G,TIAN Y,et al.A Virtual-Real Interaction Approach to Object Instance Segmentation in Traffic Scenes[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(2):863-875. [11]KENDALL A,GAL Y,CIPOLLA R.Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2018:7482-7491. [12]LI X,WANG W,HU X,et al.Selective Kernel Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2019:510-519. [13]EIGEN D,FERGUS R.Predicting Depth,Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE,2015:2650-2658. [14]EIGEN D,PUHRSCH C,FERGUS R.Depth Map Predictionfrom a Single Image using a Multi-Scale Deep Network[C]//NIPS.2014:2366-2374. [15]KLINGNER M,TERMÖHLEN J A,MIKOLAJCZYK J,et al.Self-supervised Monocular Depth Estimation:Solving the Dynamic Object Problem by Semantic Guidance[C]//ECCV.2020:582-600. [16]YIN W,LIU Y,SHEN C,et al.Enforcing Geometric Constraints of Virtual Normal for Depth Prediction[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE,2019:5683-5692. [17]FU H,GONG M,WANG C,et al.Deep Ordinal Regression Network for Monocular Depth Estimation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:2002-2011. [18]LIU J P,WEN J W,LIANG Y L.Monocular Image Depth Estimation Based on Multi-Scale Attention Oriented Network[J].Journal of South China University of Technology(Natural Science Edition),2020,48(12):52-62. [19]LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651. [20]CHEN L-C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(4):834-848. [21]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tional Networks for Biomedical Image Segmentation[C]//MICCAI.Springer,2015,9351:234-241. [22]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//ECCV.2018:833-851. [23]LIU W,RABINOVICH A,BERG A C.ParseNet:Looking Wider to See Better[C]//ICIL 2016.2016. [24]ZHEN M,WANG J,ZHOU L,et al.Learning Fully Dense Neural Networks for Image Semantic Segmentation[C]//AAAI Conference on Artificial Intelligence.Palo Alto:AAAI Press,2019:9283-9290. [25]CHEN L-C,PAPANDREOU G,SCHROFF F,et al.Rethinking Atrous Convolution for Semantic Image Segmentation[J].ar-Xiv:1706.05587v3,2017. [26]ZHAO H,SHI J,QI X,et al.Pyramid Scene Parsing Network[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2017:6230-6239. [27]REN T C,HUANG X S,DING W L,et al.Global Bilateral Segmentation Network for Segmantic Segmentation[J].Computer Science,2020,47(S01):5. [28]YU C,WANG J,PENG C,et al.BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation[C]//European Conference on Computer Vision.2018. [29]ZHAO H,ZHANG Y,LIU S,et al.PSANet:Point-wise Spatial Attention Network for Scene Parsing[C]//European Confe-rence on Computer Vision.Piscataway:Springer,2018,11213:270-286. [30]YANG J,DANG J S.Semantic segmentation of 3D point cloud based on contextual attention CNN[J].Journal on Communications,2020,41(7):195-203. [31]DUAN L J,SUN Q C,QIAO Y H.Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation[J].Chinese Journal of Computers,2021,44(2):275-291. [32]SEICHTER D,KÖHLER M,LEWANDOWSKI B,et al.Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis[J].arXiv:2011.06961v3,2020. [33]ZHANG Z,LUO P,LOY C C,et al.Facial Landmark Detection by Deep Multi-task Learning[C]//European Conference on Computer Vision.Piscataway:Springer,2014,8694:94-108. [34]RANJAN R,PATEL V M,CHELLAPPA R.HyperFace:ADeep Multi-task Learning Framework for Face Detection,Landmark Localization,Pose Estimation,and Gender Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2019,41(1):121-135. [35]LU Y,KUMAR A,ZHAI S,et al.Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer So-ciety,2017:1131-1140. [36]MISRA I,SHRIVASTAVA A,GUPTA A,et al.Cross-stitchNetworks for Multi-task Learning[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:3994-4003. [37]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE 2017:2980-2988. [38]REN S,HE K,GIRSHICK R B,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149. [39]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Computer Society,2016:770-778. |
|