计算机科学 ›› 2019, Vol. 46 ›› Issue (9): 36-46.doi: 10.11896/j.issn.1002-137X.2019.09.005
王嫣然1, 陈清亮1, 吴俊君2
WANG Yan-ran1, CHEN Qing-liang1 , WU Jun-jun2
摘要: 图像语义分割是视觉智能方向最重要的基础性技术之一,语义分割效果关系着智能系统对其应用场景的理解能力,因此在诸如无人驾驶、机器人认知与导航、安防监控与无人机着陆系统等重要领域均具有较大的应用价值。由于复杂环境下的目标存在非结构化、目标多样化、形状不规则化以及光照变化、视角变化、尺度变化与物体遮挡等各种干扰因素,给图像的语义分割带来了较大挑战。近年来,受益于深度学习理论的快速发展,图像语义分割方向涌现了一大批具有典型意义的研究成果。为启发图像语义分割领域的学术研究及其相关智能系统的工程化开发,文中首先全面阐述了图像语义分割方法的研究发展历程,并将其划分为:传统的图像语义分割方法、传统方法与深度学习相结合的图像语义分割方法、基于深度学习的图像语义分割方法;其次从复杂环境下图像语义分割面临的问题出发,重点对近年来涌现的各种面向复杂环境的语义分割方法的模型、算法、性能及存在的问题进行了详细地分析与对比,并按照强监督、弱监督、无监督图像语义分割方法分类进行阐述;然后归纳了当前主流的PASCAL VOC,Cityscape,SUN RGB-D等9类包含各种复杂环境的数据集,以及3项评估指标PA,mPA和mIoU;最后对面向复杂环境的图像语义分割研究工作进行了总结,并对其在实时视频分割、三维场景重构及无监督语义分割等方向的发展进行了展望。
中图分类号:
[1]GÓMEZ D,YÁÑEZ J,GUADA C,et al.Fuzzy image segmentation based upon hierarchical clustering[J].Knowledge-Based Systems,2015,87(7):26-37. [2]NAZ S,MAJEED H,IRSHAD H.Image segmentation usingfuzzy clustering:A survey[C]//International Conference on Emerging Technologies.Islamabad:IEEE,2010:181-186. [3]PENG B,ZHANG L,ZHANG D.A survey of graph theoretical approaches to image segmentation[J].Pattern Recognition,2013,46(3):1020-1038. [4]LIU S T,YIN F L.The Basic Principle and Its New Advances ofImage Segmentation Methods Based on Graph Cuts[J].Acta Automatica Sinica,2012,38(6):911-922.(in Chinese)刘松涛,殷福亮.基于图割的图像分割方法及其新进展[J].自动化学报,2012,38(6):911-922. [5]YI F,MOON I.Image segmentation:A survey of graph-cutmethods[C]//International Conference on Systems and Informatics.Yantai:IEEE,2012:1936-1941. [6]JIANG F,GU Q,HAO H Z,et al.Survey on Content-Based Image Segmentation Methods[J].Journal of Software,2017,28(1):160-183.(in Chinese)姜枫,顾庆,郝慧珍,等.基于内容的图像分割方法综述[J].软件学报,2017,28(1):160-183. [7]GARCIA-GARCIA A,ORTS-ESCOLANO S,OPREA S,et al.A Review on Deep Learning Techniques Applied to Semantic Segmentation[J].arXiv:1704.06857,2017. [8]SMEULDERS A W M,WORRING M,SANTINI S,et al.Content-Based Image Retrieval at the End of the Early Years[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2000,22(12):1349-1380. [9]DESAI A D,GOLD G E,HARGREAVES B A,et al.Technical Considerations for Semantic Segmentation in MRI using Convolutional Neural Networks[J].arXiv preprint arXiv:1902.01977,2019. [10]MARDIA K V,HAINSWORTH T J.A Spatial ThresholdingMethod for Image Segmentation[J].IEEE transactions on pattern analysis and machine intelligence,1988,10(6):919-927. [11]LAKSHMI S,SANKARANARAYANAN D V.A study of edge detection techniques for segmentation computing approaches[J].International Journal of Computer Applications,2010,CASCT(1):35-41. [12]GIANNAKEAS N,KARVELIS P S,EXARCHOS T P,et al.Segmentation of microarray images using pixel classification-Comparison with clustering-based methods[J].Computers in biology and medicine,2013,43(6):705-716. [13]ADAMS R,BISCHOF L.Seeded region growing[J].IEEETransactions on pattern analysis and machine intelligence,1994,16(6):641-647. [14]LI S Z.Markov random field models in computer vision[C]//European conference on computer vision.Heidelberg:Springer,1994:361-370. [15]LAFFERTY J D,MCCALLUM A,PEREIRA F C N.Condi-tional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//International Conference on Machine Learning.Williamstown:Morgan Kaufmann,2001:282-289. [16]SHI J,MALIK J.Normalized Cuts and Image Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905. [17]ROTHER C,KOLMOGOROV V,BLAKE A."GrabCut":interactive foreground extraction using iterated graph cuts[J].ACM Transactions on Graphics,2004,23(3):309-314. [18]HENZINGER M,NOE A,SCHULZ C,et al.Practical Minimum Cut Algorithms[J].ACM Journal of Experimental Algorithmics,2018,23(1):1-8. [19]XU H X,TIAN Z,DING M T.Multiscale Segmentation forSAR Image Based on Spectral Clustering and Mixture Model[J].Journal of Image and Graphics,2010,15(3):450-454.(in Chinese)徐海霞,田铮,丁明涛.基于谱聚类与混合模型的SAR图像多尺度分割[J].中国图象图形学报,2010,15(3):450-454. [20]LIU L,SHI Z G,SU H R,et al.Image Segmentation Based on Higher Order Markov Random Field[J].Journal of Computer Research and Development,2013,50(9):1933-1942.(in Chinese)刘磊,石志国,宿浩茹.基于高阶马尔可夫随机场的图像分割[J].计算机研究与发展,2013,50(9):1933-1942. [21]ARBELAEZ P,MAIRE M,FOWLKES C C,et al.Contour Detection and Hierarchical Image Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(5):898-916. [22]VINCENT L,SOILLE P.Watersheds in Digital Spaces:An Efficient Algorithm Based on Immersion Simulations[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1991,13(6):583-598. [23]ZHANG C,XUE Z,ZHU X,et al.Boosted random contextualsemantic space based representation for visual recognition[J].Information Sciences,2016,369(6):160-170. [24]PONT-TUSET J,ARBELAEZ P,BARRON J T,et al.Multis-cale Combinatorial Grouping for Image Segmentation and Object Proposal Generation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(1):128-140. [25]FARABET C,COUPRIE C,NAJMAN L,et al.Learning Hierarchical Features for Scene Labeling[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1915-1929. [26]GHIASI G,FOWLKES C C.Laplacian pyramid reconstructionand refinement for semantic segmentation[C]//European Conference on Computer Vision.Amsterdam:Springer,2016:519-534. [27]FAVREAU J D,LAFARGE F,BOUSSEAU A,et al.Extrac-ting Geometric Structures in Images with Delaunay Point Processes[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.IEEE,2019:1-1. [28]COUPRIE C,FARABET C,NAJMAN L,et al.Indoor Semantic Segmentation using depth information[J].arXiv preprint arXiv:1301.3572,2013. [29]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [30]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems.Nevada:ACM,2012:1097-1105. [31]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International journal of computer vision,2015,115(3):211-252. [32]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv preprint arXiv:1409.1556,2014. [33]LIU Y,YU J,HAN Y.Understanding the effective receptivefield in semantic image segmentation[J].Multimedia Tools and Applications,2018,77(17):22159-22171. [34]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE,2015:1-9. [35]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Im-age Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:770-778. [36]LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Massachusetts:IEEE,2015:3431-3440. [37]BADRINARAYANAN V,KENDALL A,Cipolla R.Segnet:A deep convolutional encoder-decoder architecture for image segmentation[J].arXiv preprint arXiv:1511.00561,2015. [38]CHEN L-C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE transactions on pattern analysis and machine intelligence,2017,40(4):834-848. [39]LIN G,MILAN A,SHEN C,et al.RefineNet:Multi-Path Re-finement Networks for High-Resolution Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE,2017:5168-5177. [40]ZHAO H,SHI J,QI X,et al.Pyramid Scene Parsing Network[C]//IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE,2017:6230-6239. [41]YU C,WANG J,PENG C,et al.BiSeNet:Bilateral Segmentation Network for Real-Time Semantic Segmentation[C]//European Conference on Computer Vision.Cham:Springer,2018:334-349. [42]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[J].arXiv preprint arXiv:1802.02611,2018. [43]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[J].arXiv preprint arXiv:1412.7062,2014. [44]ZHENG S,JAYASUMANA S,ROMERA-PAREDES B,et al.Conditional Random Fields as Recurrent Neural Networks[C]//IEEE International Conference on Computer Vision.Santiago:IEEE,2015:1529-1537. [45]NOH H,HONG S,HAN B.Learning deconvolution network for semantic segmentation[C]//IEEE International Conference on Computer Vision.Santiago,Chile:IEEE,2015:1520-1528. [46]HONG S,NOH H,HAN B.Decoupled Deep Neural Networkfor Semi-supervised Semantic Segmentation[C]//Neural Information Processing Systems.Montreal:IEEE,2015:1495-1503. [47]PASZKE A,CHAURASIA A,KIM S,et al.Enet:A deep neural network architecture for real-time semantic segmentation[J].arXiv preprint arXiv:1606.02147,2016. [48]YANG J,PRICE B,COHEN S,et al.Object contour detection with a fully convolutional encoder-decoder network[C]//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:193-202. [49]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking Atrous Convolution for Semantic Image Segmentation[J].arXiv preprint arXiv:1706.05587,2017. [50]YU F,KOLTUN V.Multi-Scale Context Aggregation by Dilated Convolutions[J].arXiv:1511.07122,2015. [51]ZHOU S,WU J N,WU Y,et al.Exploiting Local Structureswith the Kronecker Layer in Convolutional Networks[J].arXiv preprint arXiv:1512.09194,2015. [52]WANG P,CHEN P,YUAN Y,et al.Understanding convolution for semantic segmentation[C]//IEEE Winter Conference on Applications of Computer Vision.Nevada:IEEE,2018:1451-1460. [53]LIN G,MILAN A,SHEN C,et al.Refinenet:Multi-path refinement networks for high-resolution semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE,2017:5168-5177. [54]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE,2017:2881-2890. [55]YU C,WANG J,PENG C,et al.Learning a Discriminative Feature Network for Semantic Segmentation[J].arXiv preprint arXiv:1804.09337,2018. [56]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//European Conference on Computer Vision.Cham:Springer,2018:3-19. [57]ZHANG H,DANA K,SHI J,et al.Context encoding for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7151-7160. [58]KIRILLOV A,GIRSHICK R,HE K,et al.Panoptic FeaturePyramid Networks[J].arXiv preprint arXiv:1901.02446,2019. [59]WEI Y,LIANG X,CHEN Y,et al.Learning to segment withimage-level annotations[J].Pattern Recognition,2016,59(1):234-244. [60]WEI Y,LIANG X,CHEN Y,et al.Stc:A simple to complex framework for weakly-supervised semantic segmentation[J].IEEE transactions on pattern analysis and machine intelligence,2017,39(11):2314-2320. [61]ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Learning deepfeatures for discriminative localization[C]//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:2921-2929. [62]WEI Y,FENG J,LIANG X,et al.Object region mining with adversarial erasing:A simple classification to semantic segmentation approach[C]//IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE,2017:6488-6496. [63]ZHANG X,WEI Y,FENG J,et al.Adversarial complementary learning for weakly supervised object localization[C]//IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:1325-1334. [64]RICHTER S R,VINEET V,ROTH S,et al.Playing for data:Ground truth from computer games[C]//European Conference on Computer Vision.Amsterdam:Springer,2016:102-118. [65]YAO T,PAN Y,NGO C W,et al.Semi-supervised domain adaptation with subspace learning for visual recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE,2015:2142-2150. [66]SUN B,FENG J,SAENKO K.Return of frustratingly easy domain adaptation[C]//The Thirty-Second AAAI Conference on Artificial Intelligence.Arizona:ACM,2016:2058-2065. [67]TZENG E,HOFFMAN J,ZHANG N,et al.Deep domain confusion:Maximizing for domain invariance[J].arXiv preprint arXiv:1412.3474,2014. [68]TZENG E,HOFFMAN J,DARRELL T,et al.Simultaneousdeep transfer across domains and tasks[C]//IEEE International Conference on Computer Vision.Santiago:IEEE,2015:4068-4076. [69]TZENG E,HOFFMAN J,SAENKO K,et al.Adversarial dis-criminative domain adaptation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE,2017:4. [70]HOFFMAN J,WANG D,YU F,et al.Fcns in the wild:Pixel-level adversarial and constraint-based adaptation[J].arXiv preprint arXiv:1612.02649,2016. [71]ZHANG Y,QIU Z,YAO T,et al.Fully Convolutional Adaptation Networks for Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:6810-6818. [72]BROSTOW G J,SHOTTON J,FAUQUEUR J,et al.Segmentation and Recognition Using Structure from Motion Point Clouds[C]//European Conference on Computer Vision.Marseille:Springer,2008:44-57. [73]BROSTOW G J,FAUQUEUR J,CIPOLLA R.Semantic object classes in video:A high-definition ground truth database[J].Pattern Recognition Letters,2009,30(2):88-97. [74]LIU C,YUEN J,TORRALBA A.Sift flow:Dense correspondence across scenes and its applications[J].IEEE transactions on pattern analysis and machine intelligence,2011,33(5):978-994. [75]RUSSELL B C,TORRALBA A,MURPHY K P,et al.La-belMe:A Database and Web-Based Tool for Image Annotation[J].International Journal of Computer Vision,2008,77(1/2/3):157-173. [76]EVERINGHAM M,ESLAMI S M A,GOOL L J V,et al.The Pascal Visual Object Classes Challenge:A Retrospective[J].International Journal of Computer Vision,2015,111(1):98-136. [77]MOTTAGHI R,CHEN X,LIU X,et al.The Role of Context for Object Detection and Semantic Segmentation in the Wild[C]//IEEE Conference on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:891-898. [78]CORDTS M,OMRAN M,RAMOS S,et al.The CityscapesDataset for Semantic Urban Scene Understanding[C]//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:3213-3223. [79]ROS G,SELLART L,MATERZYNSKA J,et al.The SYNTHIA Dataset:A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes[C]//IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:3234-3243. [80]HERNANDEZ-JUAREZ D,SCHNEIDER L,ESPINOSA A,et al.Slanted Stixels:Representing San Francisco’s Steepest Streets[J].arXiv:1707.05397,2017. [81]SILBERMAN N,HOIEM D,KOHLI P,et al.Indoor segmentation and support inference from rgbd images[C]//European Conference on Computer Vision.Florence:Springer,2012:746-760. [82]XIAO J,OWENS A,TORRALBA A.Sun3d:A database of big spaces reconstructed using sfm and object labels[C]//IEEE International Conference on Computer Vision.Sydney,Australia:IEEE,2013:1625-1632. [83]SONG S,LICHTENBERG S P,XIAO J.SUN RGB-D:A RGB-D scene understanding benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition.Massachusetts:IEEE,2015:567-576. [84]JANOCH A,KARAYEV S,JIA Y,et al.A category-level 3-D object dataset:Putting the Kinect to work[C]//IEEE International Conference on Computer Vision.Barcelona:IEEE,2011:1168-1174. [85]STURGESS P,ALAHARI K,LADICKY L,et al.Combiningappearance and structure from motion features for road scene understanding[C]//British Machine Vision Conference.London:British Machine Vision Association,2009:7-10. [86]MARTIN D,FOWLKES C,TAL D,et al.A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]//IEEE International Conference on Computer Vision.Vancouver:IEEE,2001:416-425. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[4] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[5] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[6] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[7] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[8] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[9] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[10] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[11] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[12] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[13] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[14] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[15] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
|