计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220300275-7.doi: 10.11896/jsjkx.220300275

• 人工智能 • 上一篇    下一篇

基于多尺度原型分层匹配的小样本分割方法

孙开伟, 刘虎, 冉雪, 郭豪   

  1. 重庆邮电大学数据工程与可视计算重点实验室 重庆 400065
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 刘虎(845904963@qq.com)
  • 作者简介:(sunkw@cqupt.edu.cn)
  • 基金资助:
    重庆市自然科学基金面上项目(cstc2019jcyj-msxmX0021);重庆市教委项目(KJCXZD2020027);国家自然科学基金(61806033)

Few-shot Segmentation Based on Multi-scale Prototype Hierarchical Matching

SUN Kaiwei, LIU Hu, RAN Xue, GUO Hao   

  1. Key Laboratory of Data Engineering and Visual Computing,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:SUN Kaiwei,born in 1987,Ph.D,asso-ciate professor.His main research inte-rests include machine learning,data mining and big data analysis. LIU Hu,born in 1997,postgraduate.His main research interests include computer tervision,semantic segmentation and so on.
  • Supported by:
    Natural Science Foundation of Chongqing,China(cstc2019jcyj-msxmX0021),Science and Technology Research Program of Chongqing Municipal Education Commission(KJCXZD2020027) and National Natural Science Foundation of China(61806033).

摘要: 传统语义分割任务通常需要大量带标注的数据来进行训练,并且难以泛化至新的类别。小样本分割,旨在使用少量带标注的支持图像从查询图像中分割新类别目标对象。由于支持图像数据较少,从有限的支持图像中提取具有代表性的指导信息是小样本分割任务的重要挑战。为了解决这个问题,提出一种基于多尺度原型分层匹配的小样本分割方法。首先通过残差网络ResNet得到查询图像和支持图像的中层特征和高层特征;为进一步提取目标对象丰富的上下文特征信息,将提取的中层特征输入金字塔池化模块进行多尺度特征提取;最后基于原型学习的思想,对中层特征和高层特征分层生成原型并匹配修正,得到最终预测分割掩码。在PASCAL-5i数据集上进行实验研究,实验结果表明,在1-way 5-shot的设定下,提出的方法在mIoU指标上达到了66.7%,比当前主流模型PANet和PFENet分别提高了11.0%和4.8%,表明了该方法的有效性和先进性。

关键词: 小样本分割, 多尺度, 语义分割, 原型学习, 残差网络

Abstract: Traditional semantic segmentation tasks usually need a lot of labeled data for training,and it is difficult to generalize to new categories.Few-shot segmentation aims to segment new categories of target objects from query images using a small number of annotated supporting images.Due to the limited supporting image data,how to extract representative guidance information from limited support images is an important challenge for few-shot segmentation task.In order to solve this problem,a few-shot segmentation method based on multi-scale prototype hierarchical matching is proposed in this paper.Firstly,the middle-level and high-level features of the query image and the support image are obtained through the residual network ResNet.In order to further extract the rich context feature information of the target object,the extracted middle-level features are fed into the pyramid pooling module for multi-scale feature extraction.Based on the idea of prototype learning,middle-level features and high-level features are layered to generate prototypes and matched to obtain the final predicted segmentation mask.Experiments are carried out on the PASCAL-5i dataset and experimental results show that the mIoU of the proposed method achieves 66.7% in 1-way 5-shot setting,which is 11% and 4.8% higher than the current mainstream PANet and PFENet models,respectively,demonstrating the effectiveness and advanced nature of the method.

Key words: Few-shot segmentation, Muti-scale, Semantic segmentation, Prototype learning, ResNet

中图分类号: 

  • TP391
[1]HUANG G,LIU Z,VAN DER MAATEN L,et al.DenselyConnected Convolutional Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[2]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[3]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[4]ZHAO H,SHI J,QI X,et al.Pyramid Scene Parsing Network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[5]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.DeepLab:Semantic Image Segmentation with Deep Convolutional Nets,Atrous Convolution,and Fully Connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[6]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:801-818.
[7]LONG J,SHELHAMER E,DARRELL T.Fully ConvolutionalNetworks for SemanticSegmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[8]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[9]EVERINGHAM M,ESLAMIS M A,VAN GOOL L,et al.The Pascal Visual Object Classes Challenge:A Retrospective[J].International Journal of Computer Vision,2015,111(1):98-136.
[10]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[11]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[12]KOH J Y,NGUYEN D T,TRUONG Q T,et al.SideInfNet:A Deep Neural Networkfor Semi-Automatic Semantic Segmentation with Side Information[C]//European Conference on Computer Vision.Cham:Springer,2020:103-118.
[13]LUO W,YANG M.Semi-supervised Semantic Segmentation via Strong-Weak Dual-Bran-ch Network[C]//European Conference on Computer Vision.Cham:Springer,2020:784-800.
[14]DONG N,XING E P.Few-Shot Semantic Segmentation withPrototype Learning[J].British Machine Vision Conference,2018,3(4):79.
[15]WANG K,LIEW J H,ZOU Y,et al.PANet:Few-Shot ImageSemantic Segmentation with Prototype Alignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9197-9206.
[16]LIU J,QIN Y.Prototype Refinement Network for Few-Shot Segmentation[J].arXiv:2002.03579,2020.
[17]SHABAN A,BANSAL S,LIU Z,et al.One-Shot Learning for Semantic Segmentation[C]//British Machine Vision Confe-rence.2017:167.1-167.13.
[18]HUANG J,ZHU Z,HUANG G.Multi-Stage HRNet:Multiple Stage High-ResolutionNetwork for Human Pose Estimation[J].arXiv:1910.05901,2019.
[19]ZHANG C,LIN G,LIU F,et al.CANet:Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5217-5226.
[20]TIAN Z,ZHAO H,SHU M,et al.Prior Guided Feature Enrichment Network for Few-Shot Segmentation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2020(1):1-1.
[21]ZHANG X,WEI Y,YANG Y,et al.SG-One:Similarity Gui-dance Network for One-Shot Semantic Segmentation[J].IEEE Transactions on Cybernetics,2020,50(9):3855-3865.
[22]YOSINSKI J,CLUNE J,NGUYEN A,et al.UnderstandingNeural Networks Through Deep Visualization[J].arXiv:1506.06579,2015.
[23]ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[C]//European Conference on Computer Vision.Cham:Springer,2014:818-833.
[24]PASZKE A,GROSS S,MASSA F,et al.PyTorch:An Imperative Style,High-Performance Deep Learning Library[J].Advances in Neural Information Processing Systems,2019,32:8026-8037.
[25]RAKELLY K,SHELHAMER E,DARRELL T,et al.Conditional Networks for Few-Shot Semantic Segmentation[J/OL].(2018-04-04)[2021-12-11].https://openreview.net/pdf?id=SkMjFKJwG.
[26]ZHANG C,LIN G,LIU F,et al.Pyramid Graph Networks with Connection Attentions for Region-Based One-Shot Semantic Segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9587-9595.
[27]NGUYEN K,TODOROVIC S.Feature Weighting and Boosting for Few-Shot Segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:622-631.
[28]WANG H,ZHANG X,HU Y,et al.Few-Shot Semantic Seg-mentation with Democratic Attention Networks[C]//Computer Vision-ECCV 2020:16th European Conference.Springer International Publishing,2020:730-746.
[29]YANG B,LIU C,LI B,et al.Prototype Mixture Models for Few-Shot Semantic Segmentation[C]//European Conference on Computer Vision.Cham:Springer,2020:763-778.
[30]HARIHARAN B,ARBELÁEZ P,BOURDEV L,et al.Semantic Contours from Inverse Detectors[C]//2011 International Conference on Computer Vision.IEEE,2011:991-998.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!