基于双目估计的动态场景三维感知技术研究与实现

doi:10.11896/jsjkx.240300045

Abstract

Abstract: Binocular stereo vision technology has always been of great significance in the field of computer vision research.Unlike monocular or multicular technology,binocular stereo vision has the advantages of low cost,high versatility,simple use and so on while it can accurately obtain the image depth.The three-dimensional perception technology based on binocular vision can greatly improve the computer's understanding and interaction ability to the real world,further enhance the adaptability of computer vision technology in complex and changeable scenes,and play an important role in the fields of automatic driving,robot navigation,industrial inspection,aerospace,etc.This paper focuses on 3D reconstruction and object perception technology in dynamic scenes.In most cases,dynamic objects in the field of vision usually need to be focused on,while static objects,especially the background and static objects in the scene that occupy the main space in most cases,can be ignored,but they do occupy a lot of resources in the actual calculation,It is obviously meaningless and inefficient to spend too much computing resources on targets that are not concerned in the scene.In order to solve this problem,based on the in-depth study of the current mainstream binocular stereo matching methods,image segmentation and other methods,this paper proposes a dynamic scene 3D perception technology based on binocular estimation.The main innovations and research achievements include:Aiming at the low cost and efficiency of the traditional binocular stereo matching algorithm in pixel by pixel computing aggregation,a binocular stereo matching method based on two-dimensional scene instance segmentation is proposed,and the target image after mask segmentation is used for stereo matching,which not only improves the matching performance but also reduces the difficulty of dynamic target matching.At the same time,in order to solve the problem of insufficient segmentation accuracy,the mask edge filtering optimization method based on rgb image is introduced to improve the efficiency and the reconstruction accuracy of the field of view point cloud.Secondly,real-time target point cloud production is carried out based on binocular estimation depth learning network,and a real-time dynamic target perception algorithm based on GPU accelerated neighboring frame point cloud is proposed.At last,a two-dimensional and three-dimensional dynamic object real-time perception technology is proposed,which can quickly recognize the dynamic object in the detection environment while realizing real-time three-dimensional reconstruction of the target scene.

Key words: Binocular vision, Stereo matching, Image segmentation, 3D reconstruction, Depth learning, GPU parallel computing

CLC Number:

P231

HE Weilong, SU Lingli, GUO Bingxuan, LI Maosen, HAO Yan. Research and Implementation of Dynamic Scene 3D Perception Technology Based on BinocularEstimation[J].Computer Science, 2024, 51(11A): 240300045-8.

References

[1]FANG L P,HE H J,ZHOU G M.A Review of Object Detection Algorithm Research [J].Computer Engineering and Applications,2018,54(13):11-18,33.
[2]WU Q,WANG T,WANG H W,et al.A Review of Modern Intelligent Video Surveillance Research [J].Computer Application Research,2016,33(6):1601-1606.
[3]ZHANG G Y,XIANG H,ZHAO Y.A review of research oncomputer vision based autonomous driving algorithms [J].Journal of Guizhou Normal University,2016,32(6):1674-7798.
[4]LI Q H,LONG X F,NONG Z L,et al.Clinical Application of Digital Medicine 3D Reconstruction Technology in Closed Abdominal Injury in Children [J].Chinese and Foreign Medical Research,2021,19(3):191-193.
[5] MARR D,POGGIO T.A computational theory of human stereo vision[J].Proceedings of the Royal Society of London.Series B.Biological Sciences,1979,204(1156):301-328.
[6]BARNARD S T,FISCHLER M A.Computational stereo[J].ACM Computing Surveys(CSUR),1982,14(4):553-572.
[7]ZHANG Y W,HU K,WANG P S.A Review of 3D Reconstruction Algorithm Research [J].Nanjing Information Technology Journal of Cheng University(Natural Science Edition),2020,12(5):75-83.
[8]YOON K J,KWEON I S.Adaptive support-weight approach for correspondence search[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(4):650-656.
[9]GERRITS M,BEKAERT P.Local stereo matching with seg-mentation-based outlier rejection[C]//The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).IEEE,2006:66-66.
[10]LASKOWSKI Ł.A novel hybrid-maximum neural network instereo-matching process[J].Neural Computing and Applications,2013,23(7):2435-2450.
[11]ZBONTAR J,LECUN Y.Stereo matching by training a convolutional neural network to compare image patches[J].J.Mach.Learn.Res.,2016,17(1):2287-2318.
[12]MAYER N,ILG E,HAUSSER P,et al.A large dataset to train convolutional networks for disparity,optical flow,and scene flow estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4040-4048.
[13]PANG J,SUN W,REN J S J,et al.Cascade residual learning:A two-stage convolutional neural network for stereo matching[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2017:887-895.
[14]KENDALL A,MARTIROSYAN H,DASGUPTA S,et al.End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:66-75.
[15]CHANG J R,CHEN Y S.Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5410-5418.
[16]TONIONI A,TOSI F,POGGI M,et al.Real-time self-adaptive deep stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:195-204.
[17]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems.Curran Associates Inc.,2012:1097-1105.
[18]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[19]HUANG G,LIU Z,WEINBERGER K Q,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017.
[20]EIGEN D,PUHRSCH C,FERGUS R.Depth map predictionfrom a single image using a multi-scale deep network[C]//Advances in Neural Information Processing Systems.2014:2366-2374.
[21]EIGEN D,FERGUS R.Predicting depth,surface normals and semantic labels with a common multi-scale convolutional architecture[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2650-2658.
[22]SHELHAMER E,BARRON J T,DARRELL T.Scene intrinsics and depth from a single image[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2015:37-44.
[23]FU H,GONG M,WANG C,et al.Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
[24]WOFK D,MA F,YANG T J,et al.Fastdepth:Fast monocular depth estimation on embedded systems[C]//2019 International Conference on Robotics and Automation (ICRA).IEEE,2019:6101-6108.
[25]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[26]ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[27]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic image segmentation with deep convolutional nets and fully connected crfs[J].arXiv:1412.7062,2014.
[28]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[29]CHEN L C,PAPANDREOU G,SCHROFF F,et al.Rethinking atrous convolution for semantic image segmentation[J].arXiv:1706.05587,2017.
[30]CHEN L C,ZHU Y,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:801-818.

Related Articles 15

[1]	YANG Shuqi, HAN Junling, KANG Xiaodong, YANG Jingyi, GUO Hongyang, LI Bo. Improved vnet Model for 3D Liver CT Image Segmentation [J]. Computer Science, 2024, 51(6A): 230400038-6.
[2]	GUO Hongyang, CHENG Qian, KANG Xiaodong, YANG Jingyi, YANG Shuqi, LI Fang, ZHANG Rui. Multiple Attention-guided Mechanisms for Ultrasound Breast Cancer Tumor Image Segmentation [J]. Computer Science, 2024, 51(6A): 230500004-6.
[3]	SHAN Xinxin, LI Kai, WEN Ying. Medical Image Segmentation Network Integrating Full-scale Feature Fusion and RNN with Attention [J]. Computer Science, 2024, 51(5): 100-107.
[4]	WU Xiaoqin, ZHOU Wenjun, ZUO Chenglin, WANG Yifan, PENG Bo. Salient Object Detection Method Based on Multi-scale Visual Perception Feature Fusion [J]. Computer Science, 2024, 51(5): 143-150.
[5]	SONG Hao, MAO Kuanmin, ZHU Zhou. Algorithm of Stereo Matching Based on GAANET [J]. Computer Science, 2024, 51(4): 229-235.
[6]	HUANG Wenke, TENG Fei, WANG Zidan, FENG Li. Image Segmentation Based on Deep Learning:A Survey [J]. Computer Science, 2024, 51(2): 107-116.
[7]	LUO Huilan, GUO Yuchen. Gaussian-bias Self-attention and Cross-attention Based Module for Medical Image Segmentation [J]. Computer Science, 2024, 51(11A): 240300071-9.
[8]	XU Haidong, ZHANG Zili, HU Xinrong, PENG Tao , ZHANG Jun. Stereo Matching Network Based on Enhanced Superpixel Sampling [J]. Computer Science, 2024, 51(11A): 231100005-7.
[9]	YE Ruiwen, WANG Baohui. 3D Reconstruction Algorithm for Lower Limb X-ray Images Based on Generative AdversarialNetworks [J]. Computer Science, 2024, 51(11A): 230900089-7.
[10]	WANG Libin, WANG Shumei. Fundus Vascular Image Segmentation Algorithm Based on Attention Mechanism [J]. Computer Science, 2024, 51(11A): 231000003-6.
[11]	ZHANG Xiao, DONG Hongbin. Lightweight Multi-view Stereo Integrating Coarse Cost Volume and Bilateral Grid [J]. Computer Science, 2023, 50(8): 125-132.
[12]	QI Xuanlong, CHEN Hongyang, ZHAO Wenbing, ZHAO Di, GAO Jingyang. Study on BGA Packaging Void Rate Detection Based on Active Learning and U-Net++ Segmentation [J]. Computer Science, 2023, 50(6A): 220200092-6.
[13]	LIU Yao, GUAN Lihe. Superpixel Segmentation Iterative Algorithm Based on Ball-k-means Clustering [J]. Computer Science, 2023, 50(6A): 220600114-7.
[14]	YANG Jingyi, LI Fang, KANG Xiaodong, WANG Xiaotian, LIU Hanqing, HAN Junling. Ultrasonic Image Segmentation Based on SegFormer [J]. Computer Science, 2023, 50(6A): 220400273-6.
[15]	BAI Xuefei, MA Yanan, WANG Wenjian. Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion [J]. Computer Science, 2023, 50(3): 199-207.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research and Implementation of Dynamic Scene 3D Perception Technology Based on BinocularEstimation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0