Computer Science ›› 2025, Vol. 52 ›› Issue (3): 231-238.doi: 10.11896/jsjkx.231200111

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Multi-view Stereo Reconstruction with Context-guided Cost Volume and Depth Refinemen

CHEN Guangyuan, WANG Zhaohui, CHENG Ze   

  1. School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430081,China
  • Received:2023-12-18 Revised:2024-05-29 Online:2025-03-15 Published:2025-03-07
  • About author:CHEN Guangyuan,born in 2001,postgraduate.His main research interests include multi-view stereo and 3D reconstruction.
    WANG Zhaohui,born in 1967,professor,Ph.D supervisor.His main research interests include advanced computer control technology and biomedical information processing.
  • Supported by:
    National Natural Science Foundation of China(62302351).

Abstract: In response to the challenges in deep learning-based multi-view stereo(MVS) reconstruction algorithms,which include incomplete image feature extraction,ambiguous cost volume matching,and the accumulation of depth errors leading to poor reconstruction results in textureless and repetitive texture regions,a cascaded MVS network based on context-guided cost volume construction and depth refinement is proposed.First,the feature fusion module based on non-reference attention is used to filter out irrelevant features and address the inconsistency in multi-scale features through feature fusion.Then,the context-guided cost vo-lume module is used to fuse global information to enhance the accuracy and robustness of cost volume matching.Finally,the depth refinement module is employed to learn and reduce depth errors,to improve the accuracy of the low-resolution depth maps.The experimental results show that compared with MVSNet,the integrity error of the network on the DTU dataset is reduced by 24.4%,the accuracy error is reduced by 4.1 %,and the overall error is reduced by 14.3 %.The performance on the Tanks and Temples dataset is also better than most algorithms,showing strong competitiveness.

Key words: Multi-view stereo, Feature fusion, Context-guide, Cost volume matching, Depth refinement

CLC Number: 

  • TP391
[1]WANG X,WANG C,LIU B,et al.Multi-view stereo in the deeplearning era:A comprehensive review[J].Displays,2021,70:102102.
[2]FURUKAWA Y,HERNANDEZ C.Multi-view stereo:A tuto-rial[J].Foundations and Trends© in Computer Graphics and Vision,2015,9(1/2):1-148.
[3]GU J,WANG Z,KUEN J,et al.Recent advances in convolu-tional neural networks[J].Pattern Recognition,2018,77:354-377.
[4]YAO Y,LUO Z,LI S,et al.Mvsnet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:767-783.
[5]GU X,FAN Z,ZHU S,et al.Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2020:2495-2504.
[6]YANG J,MAO W,ALVAREZ J M,et al.Cost volume pyramid based depth inference for multi-viewstereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4877-4886.
[7]CHENG S,XU Z,ZHU S,et al.Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:2524-2534.
[8]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature p yramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[9]HARIS M,SHAKHNAROVICH G,UKITA N.Deep back-pro-jection networks for super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:16641673.
[10]SINHA S N,MORDOHAI P,POLLEFEYS M.Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh[C]//2007 IEEE 11th International Conference on Computer Vision.IEEE,2007:1-8.
[11]FURUKAWA Y,PONCE J.Carved visual hulls for image-based modeling[C]//Computer Vision-ECCV 2006:9th European Conference on Computer Vision,Graz,Austria,May 7-13,2006.Proceedings,Part I 9.Springer Berlin Heidelberg,2006:564-577.
[12]SCHONBERGER J L,FRAHM J M.Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4104-4113.
[13]GALLIANI S,LASINGER K,SCHINDLER K.Massively pa-rallel multiview stereopsis by surface normal diffusion[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:873-881.
[14]CAMPBELL N D F,VOGIATZIS G,HERNANDEZ C,et al.Using multiple hypotheses to improve depth- maps for multi-view stereo[C]//Computer Vision ECCV 2008:10th European Conference on Computer Vision,Marseille,France,October 12-18,2008,Proceedings,Part I 10.Springer Berlin Heidelberg,2008:766-779.
[15]TOLA E,STRECHA C,FUA P.Efficient large-scalemulti-view stereo for ultra high-resolution image sets[J].Machine Vision and Applications,2012,23:903-920.
[16]KANG S B,SZELISKI R,CHAI J.Handling occlusions in dense multi-view stereo[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.CVPR 2001.IEEE,2001.
[17]JI M,GALL J,ZHENG H,et al.Surfacenet:An end-to-end 3d neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2307-2315.
[18]YAO Y,LUO Z,LI S,et al.Recurrent mvsnet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and PaRtern recognition.2019:5525-5534.
[19]YU Z,GAO S.Fast-mvsnet:Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1949-1958.
[20]WANG F,GALLIANI S,VOGEL C,et al.Patchmatchnet:Learned multi-view patchmatch stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14194-14203.
[21]PENG R,WANG R,WANG Z,et al.Rethinking depth estimation for multi-view stereo:A unified representation[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2022:8645-8654.
[22]MI Z,DI C,XU D.Generalized binary search network for highly-efficient multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12991-13000.
[23]CAO C,REN X,FU Y.Mvsformer:Learning robust image re-presentations via transformers and temperature-based depth for multi-view stereo[J].arXiv:2208.02541,2022.
[24]Ding Y,YUAN W,Zhu Q,et al.Transmvsnet:Globalcontext-aware multi-view stereo network withtransformers[C]//Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition.2022:8585-8594.
[25]MA X,GONG Y,WANG Q,et al.Epp-mvsnet:Epipolar assembling based depth prediction for multi-view stereo[C]//Procee-dings of the IEEE/CVF International Conference on Computer Vision.2021:5732-5740.
[26]LUO A,YANG F,LI X,et al.Learning optical flow with kernel patch attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:8906-8915.
[27]YANG L,ZHANG R Y,LI L,et al.Simam:A simple,parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning.PMLR,2021:11863-11874.
[28]AANæS H,JENSEN R R,VOGIATZIS G,et al.Large-scale data for multiple-view stereopsis[J].International Journal of Computer Vision,2016,120:153-168.
[29]KNAPITSCH A,PARK J,ZHOU Q Y,et al.Tanks and tem-ples:Benchmarking large-scale scene reconstruction[J].ACM Transactions on Graphics(ToG),2017,36(4):1-13.
[30]CAMPBELL N D F,VOGIATZIS G,HERNANDEZ C,et al.Using multiple hypotheses to improve depth-maps for multi-view stereo[C]//Computer Vision ECCV 2008:10th European Conference on Computer Vision,Marseille,France,October 1218,2008,Proceedings,Part I 10.Springer Berlin Heidelberg,2008:766-779.
[31]GALLIANI S,LASINGER K,SCHINDLER K.Gipuma:Mas-sively parallel multi-view stereo reconstruction[J/OL].https://www.dgpf.de/src/tagung/jt2016/proceedings/papers/34_DLT2016_Galliani_et_al.pdf.
[32]SCHONBERGER J L,FRAHM J M.Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4104-4113.
[33]CHEN R,HAN S,XU J,et al.Point-based multi-view stereonetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1538-1547.
[34]WEI Z,ZHU Q,MIN C,et al.Aa-rmvsnet:Adaptive aggregation recurrent multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:6187-6196.
[35]YI P,TANG S,YAO J.DDR-Net:Learning multi-stage multi-view stereo with dynamic depth range[J].arXiv:2103.14275,2021.
[1] WANG Tao, BAI Xuefei, WANG Wenjian. Selective Feature Fusion for 3D CT Image Segmentation of Renal Cancer Based on Edge Enhancement [J]. Computer Science, 2025, 52(3): 41-49.
[2] WANG Mengwei, YANG Zhe. Speaker Verification Method Based on Sub-band Front-end Model and Inverse Feature Fusion [J]. Computer Science, 2025, 52(3): 214-221.
[3] ZHU Xiaoyan, WANG Wenge, WANG Jiayin, ZHANG Xuanping. Just-In-Time Software Defect Prediction Approach Based on Fine-grained Code Representationand Feature Fusion [J]. Computer Science, 2025, 52(1): 242-249.
[4] LI Xin, PU Yuanyuan, ZHAO Zhengpeng, LI Yupan, XU Dan. Image Arbitrary Style Transfer via Artistic Aesthetic Enhancement [J]. Computer Science, 2024, 51(9): 129-139.
[5] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[6] LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
[7] WANG Chao, TANG Chao, WANG Wenjian, ZHANG Jing. Infrared Human Action Recognition Method Based on Multimodal Attention Network [J]. Computer Science, 2024, 51(8): 232-241.
[8] CAI Wenliang, HUANG Jun. Lane Detection Method Based on RepVGG [J]. Computer Science, 2024, 51(7): 236-243.
[9] QUE Yue, GAN Menghan, LIU Zhiwei. Object Detection with Receptive Field Expansion and Multi-branch Aggregation [J]. Computer Science, 2024, 51(6A): 230600151-6.
[10] LIU Heng, LIN Hongyu, WU Tao. Detection Method for Workers’ Illegal Operation Behavior in PackagingWorkshop of CigaretteFactory [J]. Computer Science, 2024, 51(6A): 230700123-8.
[11] WANG Yanlin, SUN Jing, YANG Hongbo, GUO Tao, PAN Jiahua, WANG Weilian. Classification Model of Heart Sounds in Pulmonary Hypertension Based on Time-Frequency Fusion Features [J]. Computer Science, 2024, 51(6A): 230800091-7.
[12] KANG Zhiyong, LI Bicheng, LIN Huang. User Interest Recognition Method Incorporating Category Labels and Topic Information [J]. Computer Science, 2024, 51(6A): 230500169-8.
[13] HAN Zhigeng, ZHOU Ting, CHEN Geng, FU Chunshuo, CHEN Jian. RM-RT2NI:A Recommendation Model with Review Timeliness and Trusted Neighbor Influence [J]. Computer Science, 2024, 51(6A): 230800160-7.
[14] LI Guo, CHEN Chen, YANG Jing, QUN Nuo. Study on Tibetan Short Text Classification Based on DAN and FastText [J]. Computer Science, 2024, 51(6A): 230700064-5.
[15] LI Yuehao, WANG Dengjiang, JIAN Haifang, WANG Hongchang, CHENG Qinghua. LiDAR-Radar Fusion Object Detection Algorithm Based on BEV Occupancy Prediction [J]. Computer Science, 2024, 51(6): 215-222.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!