Computer Science ›› 2026, Vol. 53 ›› Issue (3): 257-265.doi: 10.11896/jsjkx.250200094

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Efficient Multi-view Stereo Reconstruction Based on Multi-granularity Feature Aggregation and Binary Search

XU Lijun, ZHAO Yujie, ZHAO Min, MA Weixuan, CHEN Kansong   

  1. School of Computer Science, Hubei University, Wuhan 430062, China
  • Received:2025-02-24 Revised:2025-05-23 Published:2026-03-12
  • About author:XU Lijun,born in 1991,Ph.D,associate professor,is a member of CCF(No.62672M).Her main research interests include computer vision,artificial intelligence and digital twins.
    CHEN Kansong,born in 1972,Ph.D,postdoctoral researcher.His main research interests include artificial intelligence,digital twin,industrial Internet and related fields.
  • Supported by:
    Knowledge Innovation Program of Wuhan-Shuguang Project(2022010801020327) and Key Research and Deve-lopment Program of Hubei Province(2022BAA045).

Abstract: In deep learning-based multi-view stereo(MVS) reconstruction,cost volume construction faces challenges of high computational complexity and memory consumption.Existing studies often employ cascade architectures or iterative optimization methods to reduce memory usage.However,the coarse-to-fine sampling strategy in cascade structures may lead to the loss of fine-grained details,weakening the perception of critical features.To address this,this paper proposes a novel multi-view stereo network framework based on a cascade structure with binary search and multi-granularity feature aggregation.The proposed framework reduces memory overhead through a cascade architecture while employing a binary search strategy to partition the depth range into multiple candidate regions.A discrete classification method is introduced to compress the depth search space,improving depth retrieval efficiency and lowering memory requirements.Furthermore,this paper proposes a multi-granularity feature aggregation strategy that embeds coarse-grained global semantic information into fine-grained cost volume construction while preserving attention to fine-grained local texture details.By fusing multi-level feature representations and incorporating intra-view adaptive aggregation and view-wise adaptive weighting strategies in the aggregation module,the proposed model enhances the perception of both global structures and local detailed features.Experimental results on the DTU and Tanks & Temples benchmark datasets demonstrate that the proposed method achieves superior point cloud reconstruction quality while maintaining low memory consumption.

Key words: Multi-view stereo, Binary search strategy, Multi-granularity feature aggregation

CLC Number: 

  • TP391.41
[1]YAO Y,LUO Z,LI S,et al.MVSNet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision.Springer,2018:767-783.
[2]YAO Y,LUO Z,LI S,et al.Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2019:5525-5534.
[3]GU X,FAN Z,ZHU S,et al.Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:2495-2504.
[4]MI Z,DI C,XU D.Generalized binary search network for highly-efficient multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:12991-13000.
[5]JI M,GALL J,ZHENG H,et al.SurfaceNet:An end-to-end 3D neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision.Pisca-taway,NJ:IEEE,2017:2307-2315.
[6]YU Z,GAO S.Fast-MVSNet:Sparse-to-dense multi-view stereo with learned propagation andGauss-Newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:1949-1958.
[7]DING Y,YUAN W,ZHU Q,et al.TransMVSNet:Global Context-aware Multi-view Stereo Network with Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8575-8584.
[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st InternationalConfe-rence on Neural Information Processing Systems.Red Hook:Curran Associates Inc.,2017:6000-6010.
[9]YANG J,MAO W,ALVAREZ J M,et al.Cost volume pyramid based depth inference for multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:4876-4885.
[10]CHENG S,XU Z,ZHU S,et al.Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway,NJ:IEEE,2020:2521-2531.
[11]WANG F,GALLIANI S,VOGEL C,et al.Patchmatchnet:Learned multi-view patchmatch stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2021:14194-14203.
[12]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Pisca-taway,NJ:IEEE,2017:2117-2125.
[13]DAI J,QI H,XIONG Y,et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE,2017:764-773.
[14]AANÆS H,JENSEN R R,VOGIATZIS G,et al.Large-scale data for multiple-view stereopsis[J].International Journal of Computer Vision,2016,120(2):153-168.
[15]KNAPITSCH A,PARK J,ZHOU Q Y,et al.Tanks and temples:Benchmarking large-scale scene reconstruction[J].ACM Transactions on Graphics,2017,36(4):1-13.
[16]WEI Z,ZHU Q,MIN C,et al.AA-RMVSNet:Adaptive aggregation recurrent multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:6187-6196.
[17]PENG R,WANG R,WANG Z,et al.Rethinking depth estimation for multi-view stereo:A unified representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8645-8654.
[18]WANG S,JIANG H,XIANG L,et al.CT-MVSNet:Efficientmulti-view stereo with cross-scale transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8645-8654.
[19]YE X,ZHAO W,LIU T,et al.Constraining depth map geometry for multi-view stereo:A dual-depth approach with saddle-shaped depth cells[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:17661-17670.
[20]MA X,GONG Y,WANG Q,et al.EPP-MVSNet:Epipolar-assembling based depth prediction for multi-view stereo[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:5732-5740.
[21]LAI H W,YE C L,LI Z,et al.MFE-MVSNet:Multi-scale feature enhancement multi-view stereo with bi-directional connections[J].IET Image Processing,2024,18(3):1234-1245.
[1] CHEN Guangyuan, WANG Zhaohui, CHENG Ze. Multi-view Stereo Reconstruction with Context-guided Cost Volume and Depth Refinemen [J]. Computer Science, 2025, 52(3): 231-238.
[2] ZHANG Xiao, DONG Hongbin. Lightweight Multi-view Stereo Integrating Coarse Cost Volume and Bilateral Grid [J]. Computer Science, 2023, 50(8): 125-132.
[3] LIU Jin-shuo, JIANG Zhuang-yi, XU Ya-bo, DENG Juan and ZHANG Lan-xin. Multithread and GPU Parallel Schema on Patch-based Multi-view Stereo Algorithm [J]. Computer Science, 2017, 44(2): 296-301.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!