计算机科学 ›› 2020, Vol. 47 ›› Issue (2): 126-134.doi: 10.11896/jsjkx.190100119

• 计算机图形学&多媒体 • 上一篇    下一篇

基于深度特征融合的图像语义分割

周鹏程1,龚声蓉1,2,钟珊1,2,包宗铭1,戴兴华1   

  1. (苏州大学计算机科学与技术学院 江苏 苏州2150062)1;
    (常熟理工学院计算机科学与工程学院 江苏 苏州215500)2
  • 收稿日期:2019-01-15 出版日期:2020-02-15 发布日期:2020-03-18
  • 通讯作者: 龚声蓉(shrgong@suda.edu.cn)
  • 基金资助:
    国家自然科学基金项目(61272005;61702055);江苏省自然科学基金项目(BK20151254,BK20151260);江苏省六大高峰人才项目(DZXX-027);教育部科技发展中心“云数融合科教创新”基金(2017B03112)

Image Semantic Segmentation Based on Deep Feature Fusion

ZHOU Peng-cheng1,GONG Sheng-rong1,2,ZHONG Shan1,2,BAO Zong-ming1,DAI Xing-hua1   

  1. (School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)1;
    (School of Computer Science and Engineering,Changshu Institute of Technology,Suzhou,Jiangsu 215500,China)2
  • Received:2019-01-15 Online:2020-02-15 Published:2020-03-18
  • About author:ZHOU Peng-cheng,born in 1992,postgraduate.His main research interests include digital image processing,computer vision and pattern recognition;GONG Sheng-rong,born in 1966,Ph.D,professor,Ph.D supervisor,is the vice chairman of Suzhou CCF Association.His main research interests include machine learning and computer vision.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61272005, 61702055), Natural Science Foundation of Jiangsu Province (BK20151254, BK20151260), Six Peak Talents Projecof Jiangsu Province (DZXX-027) and Cloud Integration Science and Education Innovation Foundation of Ministry of Education Science and Technology Development Center (2017B03112).

摘要: 在图像语义分割中使用卷积网络进行特征提取时,由于最大池化和下采样操作的重复组合引起了特征分辨率降低,从而导致上下文信息丢失,使得分割结果失去对目标位置的敏感性。虽然基于编码器-解码器架构的网络通过跳跃连接在恢复分辨率的过程中逐渐细化了输出精度,但其将相邻特征简单求和的操作忽略了特征之间的差异性,容易导致目标局部误识别等问题。为此,文中提出了基于深度特征融合的图像语义分割方法。该方法采用多组全卷积VGG16模型并联组合的网络结构,结合空洞卷积并行高效地处理金字塔中的多尺度图像,提取了多个层级的上下文特征,并通过自顶向下的方法逐层融合,最大限度地捕获上下文信息;同时,以改进损失函数而得到的逐层标签监督策略为辅助支撑,联合后端像素建模的全连接条件随机场,无论是在模型训练的难易程度还是预测输出的精度方面都有一定的优化。实验数据表明,通过对表征不同尺度上下文信息的各层深度特征进行逐层融合,图像语义分割算法在目标对象的分类和空间细节的定位方面都有所提升。在PASCAL VOC 2012和PASCAL CONTEXT两个数据集上获得的实验结果显示,所提方法分别取得了80.5%和45.93%的mIoU准确率。实验数据充分说明,并联框架中的深度特征提取、特征逐层融合和逐层标签监督策略能够联合优化算法架构。特征对比表明,该模型能够捕获丰富的上下文信息,得到更加精细的图像语义特征,较同类方法具有明显的优势。

关键词: 空洞卷积, 上下文信息, 深度特征, 特征融合, 条件随机场, 图像语义分割

Abstract: When feature extraction is performed by using convolutional networks in image semantic segmentation,the context information is lost due to the reduced resolution of features by the repeated combination of maximum pooling and downsampling operations,so that the segmentation result loses the sensitivity to the object location.Although the network based on the encoder-decoder architecture gradually refines the output precision through the jump connection in the process of restoring the resolution,the operation of simply summing the adjacent features ignores the difference between the features and easily leads to local mis-identification of objects and other issues.To this end,an image semantic segmentation method based on deep feature fusion was proposed.It adopts a network structure in which multiple sets of fully convolutional VGG16 models are combined in parallel,processes multi-scale images in the pyramid in parallel efficiently with atrous convolutions,extracts multi-level context feature,and fuses layer by layer through a top-down method to capture the context information as far as possible.At the same time,the layer-by-layer label supervision strategy based on the improved loss function is an auxiliary support with a dense conditional random field of pixels modeling in the backend,which has certain optimization in terms of the difficulty of model training and the accuracy of predictive output.Experimental data show that the image semantic segmentation algorithm improves the classification of target objects and the location of spatial details by layer-by-layer fusion of deep features that characterize different scale context information.The experimental results obtained on PASCAL VOC 2012 and PASCAL CONTEXT datasets show that the proposed method achieves mIoU accuracy of 80.5% and 45.93%,respectively.The experimental data fully demonstrate that deep feature extraction,feature layer-by-layer fusion and layer-by-layer label supervision strategy in the parallel framework can jointly optimize the algorithm architecture.The feature comparison shows that the model can capture rich context information and obtain more detailed image semantic features.Compared with similar methods,it has obvious advantages.

Key words: Atrous convolution, Conditional random field, Context information, Deep feature, Feature fusion, Image semantic segmentation

中图分类号: 

  • TP391
[1]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE Press,2015:3431-3440.
[2]YU F,KOLTUM V.Multi-Scale Context Aggregation by Dila-tedConvolutions[C]∥Proceedings of International Conference on Learning Representations.Puerto Rico:IEEE Press,2016:397-410.
[3]WANG P,CHEN P,YUAN Y,et al.Understanding Convolution for Semantic Segmentation[C]∥Proceedings of IEEE Winter Conference on Applications of Computer Vision.Santa Rosa:IEEE Press,2017:1451-1460.
[4]LIU Z,LI X,LUO P,et al.Semantic Image Segmentation via Deep Parsing Network[C]∥Proceedings of IEEE International Conference on Computer Vision.Santiago Chile:IEEE Press,2015:1377-1385.
[5]NGUYEN K,FOOKES C,SRIDHARAN S.Deep Context Mo-deling for Semantic Segmentation[C]∥Proceedings of IEEE Winter Conference on Applications of Computer Vision.Santa Rosa,California,United States:IEEE Press,2017:56-63.
[6]GHIASI G,FOWLKES C C.Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation[C]∥Proceedings of European Conference on Computer Vision.Cham:Springer Press,2016:519-534.
[7]BERTASIUS G,TORRESANI L,YU S X,et al.Convolutional Random Walk Networks for Semantic Image Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii:IEEE Press,2017:6137-6145.
[8]DAI J,HE K,SUN J.BoxSup:Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation[C]∥Proceedings of IEEE International Conference on Computer Vision.Santiago,Chile:IEEE Press,2015:1635-1643.
[9]WANG G,LUO P,LIN L,et al.Learning Object Interactions and Descriptions for Semantic Image Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii,USA:IEEE Press,2017:5235-5243.
[10]MAURO D D,FURNARI A,PATANE G,et al.Scene Adaptation for Semantic Segmentation using Adversarial Learning[C]∥Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance.Auckland,New Zealand:IEEE Press,2018:1-6.
[11]ZHANG Y H,QIU Z F,YAO T,et al.Fully Convolutional Adaptation Networks for Semantic Segmentation[C]∥Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:6810-6818.
[12]TSAI Y H,HUNG W C,Schulter S,et al.Learning to Adapt Structured Output Space for Semantic Segmentation[C]∥Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE Press,2018:7472-7481.
[13]LENG J X,LIU Y,ZHANG T L,et al.Context-Aware U-Net for Biomedical Image Segmentation[C]∥Proceedings of IEEE International Conference on Bioinformatics and Biomedicine.Madrid,Spain:IEEE Press,2018:2535-2538.
[14]BADRINARAYANAN V,KENDALL A,CIPOLLA R.Seg-Net:A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(12):2481-2495.
[15]BULO S R,NEUHOLD G,KONTSCHIEDER P.Loss Max-Pooling for Semantic Image Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii,USA:IEEE Press,2017:7082-7091.
[16]LIN G,SHEN C,HENGEL A V,et al.Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,United States:IEEE Press,2016:3194-3203.
[17]LIN G,SHEN C,HENGEL A V,et al.Exploring Context with Deep Structured Models for Semantic Segmentation [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(6):1352-1366.
[18]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,40(4):834-848.
[19]PHILIPPK,KOLTUN V.Parameter Learning and Convergent Inference for Dense Random Fields[C]∥Proceedings of International Conference on International Conference on Machine Learning.Atlanta,GA,USA:ACM Press,2013:513-521.
[20]ADAMS A,BAEK J,DAVIS M A.Fast High-Dimensional Filtering Using the Permutohedral Lattice[J].Computer Graphics Forum,2010,29(2):753-762.
[21]EVERINGHAM M,ESLAMI S M A,Van G L,et al.The PASCAL Visual Object Classes Challenge:A Retrospective [J].International Journal of Computer Vision,2015,111(1):98-136.
[22]HARIHARAN B,BOURDEV L,ARBELAEZ P,MALIK J,et al.Semantic Contours from Inverse Detectors[C]∥Proceedings of IEEE International Conference on Computer Vision.Barcelona:IEEE Press,2011:991-998.
[23]MOTTAGHI R,CHEN X,LIU X,et al.The Role of Context for Object Detection and Semantic Segmentation in the Wild[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington DC:ACM Press,2014:891-898.
[24]ABDULNABI A H,SHUAI B,WINKLER S,et al.Episodic CAMN:Contextual Attention-Based Memory Networks with Iterative Feedback for Scene Labeling[C]∥Proceedings ofIEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii,USA:IEEE Press,2018:6278-6287.
[25]WU Z,SHEN C,ANTONV D H.Bridging Category-Level and Instance-Level Semantic Image Segmentation[J].International Journal of Computer Vision,2016,111(1):140-155.
[26]ZHENG S,JAYASUMANA S,VINEET V,et al.Conditional Random Fields as Recurrent Neural Networks[C]∥Proceedings of IEEE International Conference on Computer Vision.Santiago,Chile:IEEE Press,2015:1529-1537.
[1] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[2] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[3] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[4] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[5] 陈永平, 朱建清, 谢懿, 吴含笑, 曾焕强.
基于外接圆半径差损失的实时安全帽检测算法
Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss
计算机科学, 2022, 49(6A): 424-428. https://doi.org/10.11896/jsjkx.220100252
[6] 孙洁琪, 李亚峰, 张文博, 刘鹏辉.
基于离散小波变换的双域特征融合深度卷积神经网络
Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation
计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
[7] 黄少滨, 孙雪薇, 李熔盛.
基于跨句上下文信息的神经网络关系分类方法
Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network
计算机科学, 2022, 49(6A): 119-124. https://doi.org/10.11896/jsjkx.210600150
[8] 蓝凌翔, 池明旻.
基于特征注意力融合网络的遥感变化检测研究
Remote Sensing Change Detection Based on Feature Fusion and Attention Network
计算机科学, 2022, 49(6): 193-198. https://doi.org/10.11896/jsjkx.210500058
[9] 范新南, 赵忠鑫, 严炜, 严锡君, 史朋飞.
结合注意力机制的多尺度特征融合图像去雾算法
Multi-scale Feature Fusion Image Dehazing Algorithm Combined with Attention Mechanism
计算机科学, 2022, 49(5): 50-57. https://doi.org/10.11896/jsjkx.210400093
[10] 李发光, 伊力哈木·亚尔买买提.
基于改进CenterNet的航拍绝缘子缺陷实时检测模型
Real-time Detection Model of Insulator Defect Based on Improved CenterNet
计算机科学, 2022, 49(5): 84-91. https://doi.org/10.11896/jsjkx.210400142
[11] 董奇达, 王喆, 吴松洋.
结合注意力机制与几何信息的特征融合框架
Feature Fusion Framework Combining Attention Mechanism and Geometric Information
计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180
[12] 李鹏祖, 李瑶, Ibegbu Nnamdi JULIAN, 孙超, 郭浩, 陈俊杰.
基于多特征融合的重叠组套索脑功能超网络构建及分类
Construction and Classification of Brain Function Hypernetwork Based on Overlapping Group Lasso with Multi-feature Fusion
计算机科学, 2022, 49(5): 206-211. https://doi.org/10.11896/jsjkx.210300049
[13] 高心悦, 田汉民.
基于改进U-Net网络的液滴分割方法
Droplet Segmentation Method Based on Improved U-Net Network
计算机科学, 2022, 49(4): 227-232. https://doi.org/10.11896/jsjkx.210300193
[14] 徐涛, 陈奕仁, 吕宗磊.
基于改进YOLOv3的机坪工作人员反光背心检测研究
Study on Reflective Vest Detection for Apron Workers Based on Improved YOLOv3 Algorithm
计算机科学, 2022, 49(4): 239-246. https://doi.org/10.11896/jsjkx.210200119
[15] 许华杰, 秦远卓, 杨洋.
基于多级特征融合与注意力模块的场景识别方法
Scene Recognition Method Based on Multi-level Feature Fusion and Attention Module
计算机科学, 2022, 49(4): 209-214. https://doi.org/10.11896/jsjkx.210100135
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!