计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240300045-8.doi: 10.11896/jsjkx.240300045

• 图像处理&多媒体技术 • 上一篇    下一篇

基于双目估计的动态场景三维感知技术研究与实现

何维龙1, 苏玲莉1, 郭丙轩2, 李茂森3, 郝岩1   

  1. 1 酒泉职业技术学院 甘肃 酒泉 735000
    2 武汉大学测绘遥感信息工程国家重点实验室 武汉 430072
    3 核工业航测遥感中心 河北 保定 071799
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 作者简介:(1205488897@qq.com)
  • 基金资助:
    国家重点研发计划(2019YFE0108300);国家自然科学基金(62001058);2023年甘肃省高等学校创新基金项目(2023B-449);2023年酒泉市科技支撑项目(2060499);校级科研项目(2022XJYXM06)

Research and Implementation of Dynamic Scene 3D Perception Technology Based on BinocularEstimation

HE Weilong1, SU Lingli1, GUO Bingxuan2, LI Maosen3, HAO Yan1   

  1. 1 Jiuquan Vocational and Technical College,Jiuquan,Gansu 735000,China
    2 State Key Laboratory of Information Engineeringin Surveying Mapping and Remote Sensing,Wuhan University,Wuhan 430072,China
    3 Nuclear Industry Aerial Surveying and Remote Sensing Center,Baoding,Hebei 071799,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:HE Weilong,born in 1993,master,lecturer.His main research interests include photogrammetry,remote sensing technology,and computer vision.
    SU Lingli,born in 1992,master,lectu-rer.Her main research interests include municipal engineering and computer vision.
  • Supported by:
    National Key Research and Development Program of China(2019YFE0108300),National Natural Science Foundation of China(62001058),2023 Gansu Province Higher Education Innovation Fund Project (2023B-449),2023 Jiuquan City Science and Technology Support Project(2060499) and School level Scientific Research Project(2022XJYXM06).

摘要: 双目立体视觉技术在计算机视觉领域研究中一直具有重要意义。不同于单目或多目技术,双目立体视觉在能够准确获取图像深度的同时,也兼具了低成本、高泛用性、使用简便等优势。基于双目视觉的三维感知技术能够极大提升计算机对现实世界的理解和交互能力,进一步增强计算机视觉技术在复杂、多变的场景中的适应能力,在自动驾驶、机器人导航、工业检测、航天等领域发挥着重要作用。文中重点研究动态场景中的三维重建与目标感知技术,在大多数情况中,视野中的动态目标实际上是需要重点关注的目标,而静态目标,特别是在场景中绝大多数时候都占据主要空间的背景以及静态物体往往是可以被忽略掉的,但是在实际计算时确占用了大量资源。在场景中不受关注的目标上花费过多计算资源,显然是无意义且非常低效的。针对这个问题,本文在深入研究了目前主流的双目立体匹配方法、图像分割等方法的基础上,提出了一种基于双目估计的动态场景三维感知技术。主要的创新点和研究成果包括:针对传统双目立体匹配算法逐像素计算聚合低价效率低下的问题,提出了一种基于二维场景实例分割的双目立体匹配方法,使用mask分割后的目标图像进行立体匹配,这样不仅提升了匹配性能,同时也降低了动态目标的匹配难度。针对分割精确不足的问题,引入基于RGB图像的mask边缘滤波优化方法,在提升效率的同时提升视场点云重建精度。其次,基于双目估计深度学习网络进行实时目标点云生产,并提出基于GPU加速的邻近帧点云的实时动态目标感知算法。最后提出二三维一体的动态目标实时感知技术,在对目标场景实现实时三维重建的同时,快速识别检测环境中的动态目标物体。

关键词: 双目视觉, 立体匹配, 图像分割, 三维重建, 深度学习, GPU并行计算

Abstract: Binocular stereo vision technology has always been of great significance in the field of computer vision research.Unlike monocular or multicular technology,binocular stereo vision has the advantages of low cost,high versatility,simple use and so on while it can accurately obtain the image depth.The three-dimensional perception technology based on binocular vision can greatly improve the computer's understanding and interaction ability to the real world,further enhance the adaptability of computer vision technology in complex and changeable scenes,and play an important role in the fields of automatic driving,robot navigation,industrial inspection,aerospace,etc.This paper focuses on 3D reconstruction and object perception technology in dynamic scenes.In most cases,dynamic objects in the field of vision usually need to be focused on,while static objects,especially the background and static objects in the scene that occupy the main space in most cases,can be ignored,but they do occupy a lot of resources in the actual calculation,It is obviously meaningless and inefficient to spend too much computing resources on targets that are not concerned in the scene.In order to solve this problem,based on the in-depth study of the current mainstream binocular stereo matching methods,image segmentation and other methods,this paper proposes a dynamic scene 3D perception technology based on binocular estimation.The main innovations and research achievements include:Aiming at the low cost and efficiency of the traditional binocular stereo matching algorithm in pixel by pixel computing aggregation,a binocular stereo matching method based on two-dimensional scene instance segmentation is proposed,and the target image after mask segmentation is used for stereo matching,which not only improves the matching performance but also reduces the difficulty of dynamic target matching.At the same time,in order to solve the problem of insufficient segmentation accuracy,the mask edge filtering optimization method based on rgb image is introduced to improve the efficiency and the reconstruction accuracy of the field of view point cloud.Secondly,real-time target point cloud production is carried out based on binocular estimation depth learning network,and a real-time dynamic target perception algorithm based on GPU accelerated neighboring frame point cloud is proposed.At last,a two-dimensional and three-dimensional dynamic object real-time perception technology is proposed,which can quickly recognize the dynamic object in the detection environment while realizing real-time three-dimensional reconstruction of the target scene.

Key words: Binocular vision, Stereo matching, Image segmentation, 3D reconstruction, Depth learning, GPU parallel computing

中图分类号: 

  • P231
[1]FANG L P,HE H J,ZHOU G M.A Review of Object Detection Algorithm Research [J].Computer Engineering and Applications,2018,54(13):11-18,33.
[2]WU Q,WANG T,WANG H W,et al.A Review of Modern Intelligent Video Surveillance Research [J].Computer Application Research,2016,33(6):1601-1606.
[3]ZHANG G Y,XIANG H,ZHAO Y.A review of research oncomputer vision based autonomous driving algorithms [J].Journal of Guizhou Normal University,2016,32(6):1674-7798.
[4]LI Q H,LONG X F,NONG Z L,et al.Clinical Application of Digital Medicine 3D Reconstruction Technology in Closed Abdominal Injury in Children [J].Chinese and Foreign Medical Research,2021,19(3):191-193.
[5] MARR D,POGGIO T.A computational theory of human stereo vision[J].Proceedings of the Royal Society of London.Series B.Biological Sciences,1979,204(1156):301-328.
[6]BARNARD S T,FISCHLER M A.Computational stereo[J].ACM Computing Surveys(CSUR),1982,14(4):553-572.
[7]ZHANG Y W,HU K,WANG P S.A Review of 3D Reconstruction Algorithm Research [J].Nanjing Information Technology Journal of Cheng University(Natural Science Edition),2020,12(5):75-83.
[8]YOON K J,KWEON I S.Adaptive support-weight approach for correspondence search[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(4):650-656.
[9]GERRITS M,BEKAERT P.Local stereo matching with seg-mentation-based outlier rejection[C]//The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).IEEE,2006:66-66.
[10]LASKOWSKI Ł.A novel hybrid-maximum neural network instereo-matching process[J].Neural Computing and Applications,2013,23(7):2435-2450.
[11]ZBONTAR J,LECUN Y.Stereo matching by training a convolutional neural network to compare image patches[J].J.Mach.Learn.Res.,2016,17(1):2287-2318.
[12]MAYER N,ILG E,HAUSSER P,et al.A large dataset to train convolutional networks for disparity,optical flow,and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4040-4048.
[13]PANG J,SUN W,REN J S J,et al.Cascade residual learning:A two-stage convolutional neural network for stereo matching[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:887-895.
[14]KENDALL A,MARTIROSYAN H,DASGUPTA S,et al.End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:66-75.
[15]CHANG J R,CHEN Y S.Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5410-5418.
[16]TONIONI A,TOSI F,POGGI M,et al.Real-time self-adaptive deep stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:195-204.
[17]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems.Curran Associates Inc.,2012:1097-1105.
[18]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[19]HUANG G,LIU Z,WEINBERGER K Q,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017.
[20]EIGEN D,PUHRSCH C,FERGUS R.Depth map predictionfrom a single image using a multi-scale deep network[C]//Advances in Neural Information Processing Systems.2014:2366-2374.
[21]EIGEN D,FERGUS R.Predicting depth,surface normals and semantic labels with a common multi-scale convolutional architecture[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2650-2658.
[22]SHELHAMER E,BARRON J T,DARRELL T.Scene intrinsics and depth from a single image[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2015:37-44.
[23]FU H,GONG M,WANG C,et al.Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
[24]WOFK D,MA F,YANG T J,et al.Fastdepth:Fast monocular depth estimation on embedded systems[C]//2019 International Conference on Robotics and Automation (ICRA).IEEE,2019:6101-6108.
[25]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[26]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[27]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic image segmentation with deep convolutional nets and fully connected crfs[J].arXiv:1412.7062,2014.
[28]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[29]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[30]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:801-818.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!