计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231100005-7.doi: 10.11896/jsjkx.231100005

• 图像处理&多媒体技术 • 上一篇    下一篇

基于改进超像素采样的立体匹配网络

徐海东1,2, 张自力1,2,3, 胡新荣1,2,3, 彭涛1,2,3, 张俊4   

  1. 1 湖北省服装信息化工程技术研究中心 武汉 430200
    2 武汉纺织大学计算机与人工智能学院 武汉 430200
    3 纺织服装智能化湖北省工程研究中心 武汉 430200
    4 武汉工程大学计算机科学与工程学院 武汉 430205
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 张自力(zlzhang@wtu.edu.cn)
  • 作者简介:(2846761532@qq.com)
  • 基金资助:
    湖北省教育厅科学技术研究计划项目(B2017066)

Stereo Matching Network Based on Enhanced Superpixel Sampling

XU Haidong1,2, ZHANG Zili 1,2,3, HU Xinrong1,2,3, PENG Tao1,2,3 , ZHANG Jun4   

  1. 1 Engineering Research Center of Hubei Province for Clothing Information,Wuhan 430200,China
    2 School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan 430200,China
    3 Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion,Wuhan 430200,China
    4 School of Computer Science and Engineering,Wuhan Institute of Technology,Wuhan 430205,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:XU Haidong,born in 1999,postgra-duate,is a member of CCF(No.Q6975G).His main research interests include machine learning and image processing.
    ZHANG Zili,born in 1981,Ph.D,lectu-rer,is a member of CCF(No.99006M).His main research interests include machine learning and image processing.
  • Supported by:
    Science and Technology Research Project of Education Department of Hubei Province(B2017066).

摘要: 针对立体匹配中细节丢失、有遮挡,以及无纹理区域匹配精度低的问题,提出了一种基于改进超像素采样的立体匹配方法。首先,利用改进的超像素采样方法对用于立体匹配的高分辨率输入图像进行下采样,随后,将下采样后的图像对输入到立体匹配网络中,利用权值共享的卷积网络进行特征提取,使用3D卷积获取特征融合后的Cost Volume并生成视差图,再将输出的视差图进行上采样还原为最终的视差图。针对超像素采样过程中容易丢失细节从而影响后续立体匹配精度的问题,引入特征金字塔注意力模块(Feature Pyramid Attention,FPA)和改进的残差结构。根据上述两个方面的创新,提出了基于超像素采样的立体匹配网络FPSMnet(Feature Pyramid Stereo Matching Network),并选取、划分图像数据集BSDS500和NYUv2作为超像素采样的训练、验证和测试的数据集。立体匹配实验结果表明,与基准方法相比,所提算法在SceneFlow和HR-VS数据集上的平均像素误差分别下降了0.25和0.52,在不影响运行时间的前提下提高了匹配精度。

关键词: 深度学习, 超像素, 立体匹配, 注意力机制

Abstract: Aiming at the accuracy challenges in stereo matching related to details,occlusion,and textureless regions,a stereo matching method based on improved superpixel sampling is proposed.Initially,an enhanced superpixel sampling method is employed to downsample the high-resolution input images used for stereo matching.Subsequently,the downsampled image pairs are input into the stereo matching network,where a convolutional network with shared weights is utilized for feature extraction.Using 3D convolution,a feature-fused Cost Volume is generated,leading to the creation of a disparity map.The outputted disparity map is then upsampled to reconstruct the final disparity map.To tackle the issue of potential detail loss during the superpixel sampling process,two innovations are introduced:the feature pyramid attention module(FPA)and an improved residual structure.Based on these two innovations,a stereo matching network named FPSMnet(feature pyramid stereo matching network)is proposed.This paper selects and partitions the image datasets BSDS500 and NYUv2 for training,validation,and testing of superpixel sampling.Experimental results in stereo matching demonstrate that,compared to the baseline method,the proposed algorithm achieves a reduction of 0.25 and 0.52 in average pixel errors on the SceneFlow and HR-VS datasets,respectively.These improvements are achieved without compromising runtime efficiency.

Key words: Deep learning, Superpixels, Stereo matching, Attention mechanism

中图分类号: 

  • TP391
[1]YAO A Q,XU J M.Electric vehicle charging port recognitionand positioning system based on binocular vision[J].Sensors and Microsystems,2021,40(7):81-84.
[2]QI Y F,MA Z Y.Multi-loss head pose estimation based on deep residual networks[J].Computer Engineering,2020,46(12):247-253.
[3]ZENATI N,ZERHOUNI N.Dense stereo matching with application to augmented reality[C]//2007 IEEE International Conference on Signal Processing and Communications.IEEE,2007:1503-1506.
[4]SCHARSTEIN D,SZELISKI R.A taxonomy and evaluation of dense two-frame stereo correspondence algorithms[J].International Journal of Computer Vision,2002,47(1/2/3):7-42.
[5]CHANG J R,CHEN Y S.Pyramid Stereo Matching Network[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition:[Volume 8 of 13].IEEE,2018:5410-5418.
[6]YANG F,SUN Q,JIN H,et al.Superpixel segmentation with fully convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:13964-13973.
[7]SONG X Y,ZHOU L L,LI Z G,et al.A comprehensive survey of superpixel methods in image segmentation[J].Journal of Image and Graphics,China,2015,20(5):599-608.
[8]JAMPANI V,SUN D Q,LIU M Y,et al.Superpixel Sampling Networks[C]//Computer Vision-ECCV 2018:15th European Conference.Springer,2018:363-380.
[9]LI P,MA W.OverSegNet:A convolutional encoder-decoder net-work for image over-segmentation[J].Computers and Electrical Engineering,2023,107:108610.
[10]LI H,XIONG P,AN J,et al.Pyramid attention network for semantic segmentation[J].arXiv:1805.10180,2018.
[11]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[12]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[13]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[14]TIAN X,WANG L,DING Q.A review of image semantic segmentation methods based on deep learning[J].Journal of Software,2019,30(2):440-468.
[15]WANG Y R,CHEN Q L,WU J J.A review of image semantic segmentation methods for complex environments[J].Computer Science,2019,46(9):36-46.
[16]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[17]VAN DEN BERGH M,BOIX X,ROIG G,et al.Seeds:Superpixels extracted via energy-driven sampling[C]//Computer Vision-ECCV 2012:12th European Conference on Computer Vision,Florence,Italy,Part VII 12.Springer Berlin Heidelberg,2012:13-26.
[18]ARBELAEZ P,MAIRE M,FOWLKES C,et al.Contour detection and hierarchical image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,33(5):898-916.
[19]SILBERMAN N,HOIEM D,KOHLI P,et al.Indoor segmentation and support inference from rgbd images[C]//Computer Vision-ECCV 2012:12th European Conference on Computer Vision,Florence,Italy.Springer Berlin Heidelberg,2012:746-760.
[20]STUTZ D,HERMANS A,LEIBE B.Superpixels:An evaluation of the state-of-the-art[J].Computer Vision and Image Understanding,2018,166:1-27.
[21]ACHANTA R,SHAJI A,SMITH K,et al.SLIC Super-pixels Compared to State-of-the-Art Superpixel Methods[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(11):2274-2282.
[22]LIU M Y,TUZEL O,RAMALINGAM S,et al.Entropy rate superpixel segmentation[C]//CVPR 2011.IEEE,2011:2097-2104.
[23]MAYER N,ILG E,HAUSSER P,et al.A large dataset to train convolutional networks for disparity,optical flow,and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4040-4048.
[24]YANG G,MANELA J,HAPPOLD M,et al.Hierarchical deep stereo matching on high-resolution images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5515-5524.
[25]PANG J H,SUN W X,JIMMY S J R,et al.Cascade Residual Learning:A Two-stage Convolutional Neural Network for Ste-reo Matching[C]//2017 IEEE International Conference on Computer Vision Workshops(ICCVW 2017).Venice,Italy,2[v.2].:Institute of Electrical and Electronics Engineers,2017:878-886.
[26]KENDALL A,MARTIROSYAN H,DASGUPTA S,et al.End-to-End Learning of Geometry and Context for Deep Stereo Regression[C]//ICCV 2017.IEEE,2017.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!