计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 257-265.doi: 10.11896/jsjkx.250200094

• 计算机图形学 & 多媒体 • 上一篇    下一篇

基于多粒度特征聚合与二分搜索的高效多视图立体重建

许立君, 赵宇杰, 赵敏, 马为駽, 陈侃松   

  1. 湖北大学计算机学院 武汉 430062
  • 收稿日期:2025-02-24 修回日期:2025-05-23 发布日期:2026-03-12
  • 通讯作者: 陈侃松(kschen1999@aliyun.com)
  • 作者简介:(xulijun@hubu.edu.cn)
  • 基金资助:
    武汉市知识创新专项——曙光计划项目(2022010801020327);湖北省重点研发计划项目(2022BAA045)

Efficient Multi-view Stereo Reconstruction Based on Multi-granularity Feature Aggregation and Binary Search

XU Lijun, ZHAO Yujie, ZHAO Min, MA Weixuan, CHEN Kansong   

  1. School of Computer Science, Hubei University, Wuhan 430062, China
  • Received:2025-02-24 Revised:2025-05-23 Online:2026-03-12
  • About author:XU Lijun,born in 1991,Ph.D,associate professor,is a member of CCF(No.62672M).Her main research interests include computer vision,artificial intelligence and digital twins.
    CHEN Kansong,born in 1972,Ph.D,postdoctoral researcher.His main research interests include artificial intelligence,digital twin,industrial Internet and related fields.
  • Supported by:
    Knowledge Innovation Program of Wuhan-Shuguang Project(2022010801020327) and Key Research and Deve-lopment Program of Hubei Province(2022BAA045).

摘要: 在基于深度学习的多视图立体重建方法中,代价体构建面临高计算复杂度和内存消耗的挑战。现有研究多采用级联架构或迭代优化方法降低内存消耗,但级联架构的粗到细采样策略可能导致细节信息丢失,削弱关键特征感知能力。为此,提出了一种基于级联结构的二分搜索与多粒度特征聚合的多视图立体网络框架。该框架通过级联架构减少内存占用,利用二分搜索策略将深度范围划分为多个预选区域,并通过离散分类方法压缩深度值搜索空间,提高深度检索效率并降低内存需求。此外,提出了多粒度特征信息聚合策略,将粗粒度全局语义信息嵌入细粒度代价体构建中,同时关注细粒度局部纹理信息。通过融合不同层次的特征表示,并在聚合模块中引入视图内自适应聚合和逐视图自适应加权策略,增强了模型对全局结构和局部细节特征的感知能力。实验结果表明,在DTU和Tanks & Temples公共数据集上,此方法在保持低内存消耗的同时,实现了优异的点云重建效果。

关键词: 多视图立体, 二分搜索策略, 多粒度特征信息聚合策略

Abstract: In deep learning-based multi-view stereo(MVS) reconstruction,cost volume construction faces challenges of high computational complexity and memory consumption.Existing studies often employ cascade architectures or iterative optimization methods to reduce memory usage.However,the coarse-to-fine sampling strategy in cascade structures may lead to the loss of fine-grained details,weakening the perception of critical features.To address this,this paper proposes a novel multi-view stereo network framework based on a cascade structure with binary search and multi-granularity feature aggregation.The proposed framework reduces memory overhead through a cascade architecture while employing a binary search strategy to partition the depth range into multiple candidate regions.A discrete classification method is introduced to compress the depth search space,improving depth retrieval efficiency and lowering memory requirements.Furthermore,this paper proposes a multi-granularity feature aggregation strategy that embeds coarse-grained global semantic information into fine-grained cost volume construction while preserving attention to fine-grained local texture details.By fusing multi-level feature representations and incorporating intra-view adaptive aggregation and view-wise adaptive weighting strategies in the aggregation module,the proposed model enhances the perception of both global structures and local detailed features.Experimental results on the DTU and Tanks & Temples benchmark datasets demonstrate that the proposed method achieves superior point cloud reconstruction quality while maintaining low memory consumption.

Key words: Multi-view stereo, Binary search strategy, Multi-granularity feature aggregation

中图分类号: 

  • TP391.41
[1]YAO Y,LUO Z,LI S,et al.MVSNet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision.Springer,2018:767-783.
[2]YAO Y,LUO Z,LI S,et al.Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2019:5525-5534.
[3]GU X,FAN Z,ZHU S,et al.Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:2495-2504.
[4]MI Z,DI C,XU D.Generalized binary search network for highly-efficient multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:12991-13000.
[5]JI M,GALL J,ZHENG H,et al.SurfaceNet:An end-to-end 3D neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision.Pisca-taway,NJ:IEEE,2017:2307-2315.
[6]YU Z,GAO S.Fast-MVSNet:Sparse-to-dense multi-view stereo with learned propagation andGauss-Newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:1949-1958.
[7]DING Y,YUAN W,ZHU Q,et al.TransMVSNet:Global Context-aware Multi-view Stereo Network with Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8575-8584.
[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st InternationalConfe-rence on Neural Information Processing Systems.Red Hook:Curran Associates Inc.,2017:6000-6010.
[9]YANG J,MAO W,ALVAREZ J M,et al.Cost volume pyramid based depth inference for multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:4876-4885.
[10]CHENG S,XU Z,ZHU S,et al.Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway,NJ:IEEE,2020:2521-2531.
[11]WANG F,GALLIANI S,VOGEL C,et al.Patchmatchnet:Learned multi-view patchmatch stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2021:14194-14203.
[12]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Pisca-taway,NJ:IEEE,2017:2117-2125.
[13]DAI J,QI H,XIONG Y,et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE,2017:764-773.
[14]AANÆS H,JENSEN R R,VOGIATZIS G,et al.Large-scale data for multiple-view stereopsis[J].International Journal of Computer Vision,2016,120(2):153-168.
[15]KNAPITSCH A,PARK J,ZHOU Q Y,et al.Tanks and temples:Benchmarking large-scale scene reconstruction[J].ACM Transactions on Graphics,2017,36(4):1-13.
[16]WEI Z,ZHU Q,MIN C,et al.AA-RMVSNet:Adaptive aggregation recurrent multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:6187-6196.
[17]PENG R,WANG R,WANG Z,et al.Rethinking depth estimation for multi-view stereo:A unified representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8645-8654.
[18]WANG S,JIANG H,XIANG L,et al.CT-MVSNet:Efficientmulti-view stereo with cross-scale transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8645-8654.
[19]YE X,ZHAO W,LIU T,et al.Constraining depth map geometry for multi-view stereo:A dual-depth approach with saddle-shaped depth cells[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:17661-17670.
[20]MA X,GONG Y,WANG Q,et al.EPP-MVSNet:Epipolar-assembling based depth prediction for multi-view stereo[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:5732-5740.
[21]LAI H W,YE C L,LI Z,et al.MFE-MVSNet:Multi-scale feature enhancement multi-view stereo with bi-directional connections[J].IET Image Processing,2024,18(3):1234-1245.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!