计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 257-265.doi: 10.11896/jsjkx.250200094
许立君, 赵宇杰, 赵敏, 马为駽, 陈侃松
XU Lijun, ZHAO Yujie, ZHAO Min, MA Weixuan, CHEN Kansong
摘要: 在基于深度学习的多视图立体重建方法中,代价体构建面临高计算复杂度和内存消耗的挑战。现有研究多采用级联架构或迭代优化方法降低内存消耗,但级联架构的粗到细采样策略可能导致细节信息丢失,削弱关键特征感知能力。为此,提出了一种基于级联结构的二分搜索与多粒度特征聚合的多视图立体网络框架。该框架通过级联架构减少内存占用,利用二分搜索策略将深度范围划分为多个预选区域,并通过离散分类方法压缩深度值搜索空间,提高深度检索效率并降低内存需求。此外,提出了多粒度特征信息聚合策略,将粗粒度全局语义信息嵌入细粒度代价体构建中,同时关注细粒度局部纹理信息。通过融合不同层次的特征表示,并在聚合模块中引入视图内自适应聚合和逐视图自适应加权策略,增强了模型对全局结构和局部细节特征的感知能力。实验结果表明,在DTU和Tanks & Temples公共数据集上,此方法在保持低内存消耗的同时,实现了优异的点云重建效果。
中图分类号:
| [1]YAO Y,LUO Z,LI S,et al.MVSNet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision.Springer,2018:767-783. [2]YAO Y,LUO Z,LI S,et al.Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2019:5525-5534. [3]GU X,FAN Z,ZHU S,et al.Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:2495-2504. [4]MI Z,DI C,XU D.Generalized binary search network for highly-efficient multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:12991-13000. [5]JI M,GALL J,ZHENG H,et al.SurfaceNet:An end-to-end 3D neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision.Pisca-taway,NJ:IEEE,2017:2307-2315. [6]YU Z,GAO S.Fast-MVSNet:Sparse-to-dense multi-view stereo with learned propagation andGauss-Newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:1949-1958. [7]DING Y,YUAN W,ZHU Q,et al.TransMVSNet:Global Context-aware Multi-view Stereo Network with Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8575-8584. [8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st InternationalConfe-rence on Neural Information Processing Systems.Red Hook:Curran Associates Inc.,2017:6000-6010. [9]YANG J,MAO W,ALVAREZ J M,et al.Cost volume pyramid based depth inference for multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2020:4876-4885. [10]CHENG S,XU Z,ZHU S,et al.Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway,NJ:IEEE,2020:2521-2531. [11]WANG F,GALLIANI S,VOGEL C,et al.Patchmatchnet:Learned multi-view patchmatch stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2021:14194-14203. [12]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Pisca-taway,NJ:IEEE,2017:2117-2125. [13]DAI J,QI H,XIONG Y,et al.Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE,2017:764-773. [14]AANÆS H,JENSEN R R,VOGIATZIS G,et al.Large-scale data for multiple-view stereopsis[J].International Journal of Computer Vision,2016,120(2):153-168. [15]KNAPITSCH A,PARK J,ZHOU Q Y,et al.Tanks and temples:Benchmarking large-scale scene reconstruction[J].ACM Transactions on Graphics,2017,36(4):1-13. [16]WEI Z,ZHU Q,MIN C,et al.AA-RMVSNet:Adaptive aggregation recurrent multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:6187-6196. [17]PENG R,WANG R,WANG Z,et al.Rethinking depth estimation for multi-view stereo:A unified representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8645-8654. [18]WANG S,JIANG H,XIANG L,et al.CT-MVSNet:Efficientmulti-view stereo with cross-scale transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:8645-8654. [19]YE X,ZHAO W,LIU T,et al.Constraining depth map geometry for multi-view stereo:A dual-depth approach with saddle-shaped depth cells[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:17661-17670. [20]MA X,GONG Y,WANG Q,et al.EPP-MVSNet:Epipolar-assembling based depth prediction for multi-view stereo[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:5732-5740. [21]LAI H W,YE C L,LI Z,et al.MFE-MVSNet:Multi-scale feature enhancement multi-view stereo with bi-directional connections[J].IET Image Processing,2024,18(3):1234-1245. |
|
||