计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 231-238.doi: 10.11896/jsjkx.231200111

• 计算机图形学&多媒体 • 上一篇    下一篇

融合上下文引导代价体和深度细化的多视图立体重建

陈光远, 王朝辉, 程泽   

  1. 武汉科技大学计算机科学与技术学院 武汉 430081
  • 收稿日期:2023-12-18 修回日期:2024-05-29 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 王朝辉(zhwang_pdoc@163.com)
  • 作者简介:(Cgy2001@wust.edu.cn)
  • 基金资助:
    国家自然科学基金(62302351)

Multi-view Stereo Reconstruction with Context-guided Cost Volume and Depth Refinemen

CHEN Guangyuan, WANG Zhaohui, CHENG Ze   

  1. School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430081,China
  • Received:2023-12-18 Revised:2024-05-29 Online:2025-03-15 Published:2025-03-07
  • About author:CHEN Guangyuan,born in 2001,postgraduate.His main research interests include multi-view stereo and 3D reconstruction.
    WANG Zhaohui,born in 1967,professor,Ph.D supervisor.His main research interests include advanced computer control technology and biomedical information processing.
  • Supported by:
    National Natural Science Foundation of China(62302351).

摘要: 针对基于深度学习的多视图立体(Multi-view Stereo,MVS)重建算法仍然存在图像特征提取不全面、代价体匹配模糊以及深度误差不断积累而导致在无纹理和重复纹理区域重建效果差的问题,提出了基于上下文引导的代价体构建和深度细化的级联MVS网络。首先,利用基于无参注意力的特征融合模块过滤无用特征并通过特征融合来解决多尺度特征不一致的问题;然后,利用基于上下文引导的代价体模块融合全局信息来提高代价体匹配的完整性和鲁棒性;最后,利用深度细化模块学习深度残差来提升低分辨下深度图的准确性。实验结果表明,在DTU数据集上,该网络相比MVSNet完整度误差减小了24.4%,准确度误差减小了4.1%,整体误差减小了14.3%,其在Tanks and Temples数据集上性能也优于大多数算法,展现出强大的竞争力。

关键词: 多视图立体, 特征融合, 上下文引导, 代价体匹配, 深度细化

Abstract: In response to the challenges in deep learning-based multi-view stereo(MVS) reconstruction algorithms,which include incomplete image feature extraction,ambiguous cost volume matching,and the accumulation of depth errors leading to poor reconstruction results in textureless and repetitive texture regions,a cascaded MVS network based on context-guided cost volume construction and depth refinement is proposed.First,the feature fusion module based on non-reference attention is used to filter out irrelevant features and address the inconsistency in multi-scale features through feature fusion.Then,the context-guided cost vo-lume module is used to fuse global information to enhance the accuracy and robustness of cost volume matching.Finally,the depth refinement module is employed to learn and reduce depth errors,to improve the accuracy of the low-resolution depth maps.The experimental results show that compared with MVSNet,the integrity error of the network on the DTU dataset is reduced by 24.4%,the accuracy error is reduced by 4.1 %,and the overall error is reduced by 14.3 %.The performance on the Tanks and Temples dataset is also better than most algorithms,showing strong competitiveness.

Key words: Multi-view stereo, Feature fusion, Context-guide, Cost volume matching, Depth refinement

中图分类号: 

  • TP391
[1]WANG X,WANG C,LIU B,et al.Multi-view stereo in the deeplearning era:A comprehensive review[J].Displays,2021,70:102102.
[2]FURUKAWA Y,HERNANDEZ C.Multi-view stereo:A tuto-rial[J].Foundations and Trends© in Computer Graphics and Vision,2015,9(1/2):1-148.
[3]GU J,WANG Z,KUEN J,et al.Recent advances in convolu-tional neural networks[J].Pattern Recognition,2018,77:354-377.
[4]YAO Y,LUO Z,LI S,et al.Mvsnet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:767-783.
[5]GU X,FAN Z,ZHU S,et al.Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2020:2495-2504.
[6]YANG J,MAO W,ALVAREZ J M,et al.Cost volume pyramid based depth inference for multi-viewstereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4877-4886.
[7]CHENG S,XU Z,ZHU S,et al.Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:2524-2534.
[8]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature p yramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[9]HARIS M,SHAKHNAROVICH G,UKITA N.Deep back-pro-jection networks for super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:16641673.
[10]SINHA S N,MORDOHAI P,POLLEFEYS M.Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh[C]//2007 IEEE 11th International Conference on Computer Vision.IEEE,2007:1-8.
[11]FURUKAWA Y,PONCE J.Carved visual hulls for image-based modeling[C]//Computer Vision-ECCV 2006:9th European Conference on Computer Vision,Graz,Austria,May 7-13,2006.Proceedings,Part I 9.Springer Berlin Heidelberg,2006:564-577.
[12]SCHONBERGER J L,FRAHM J M.Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4104-4113.
[13]GALLIANI S,LASINGER K,SCHINDLER K.Massively pa-rallel multiview stereopsis by surface normal diffusion[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:873-881.
[14]CAMPBELL N D F,VOGIATZIS G,HERNANDEZ C,et al.Using multiple hypotheses to improve depth- maps for multi-view stereo[C]//Computer Vision ECCV 2008:10th European Conference on Computer Vision,Marseille,France,October 12-18,2008,Proceedings,Part I 10.Springer Berlin Heidelberg,2008:766-779.
[15]TOLA E,STRECHA C,FUA P.Efficient large-scalemulti-view stereo for ultra high-resolution image sets[J].Machine Vision and Applications,2012,23:903-920.
[16]KANG S B,SZELISKI R,CHAI J.Handling occlusions in dense multi-view stereo[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.CVPR 2001.IEEE,2001.
[17]JI M,GALL J,ZHENG H,et al.Surfacenet:An end-to-end 3d neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2307-2315.
[18]YAO Y,LUO Z,LI S,et al.Recurrent mvsnet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and PaRtern recognition.2019:5525-5534.
[19]YU Z,GAO S.Fast-mvsnet:Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1949-1958.
[20]WANG F,GALLIANI S,VOGEL C,et al.Patchmatchnet:Learned multi-view patchmatch stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14194-14203.
[21]PENG R,WANG R,WANG Z,et al.Rethinking depth estimation for multi-view stereo:A unified representation[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2022:8645-8654.
[22]MI Z,DI C,XU D.Generalized binary search network for highly-efficient multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12991-13000.
[23]CAO C,REN X,FU Y.Mvsformer:Learning robust image re-presentations via transformers and temperature-based depth for multi-view stereo[J].arXiv:2208.02541,2022.
[24]Ding Y,YUAN W,Zhu Q,et al.Transmvsnet:Globalcontext-aware multi-view stereo network withtransformers[C]//Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition.2022:8585-8594.
[25]MA X,GONG Y,WANG Q,et al.Epp-mvsnet:Epipolar assembling based depth prediction for multi-view stereo[C]//Procee-dings of the IEEE/CVF International Conference on Computer Vision.2021:5732-5740.
[26]LUO A,YANG F,LI X,et al.Learning optical flow with kernel patch attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:8906-8915.
[27]YANG L,ZHANG R Y,LI L,et al.Simam:A simple,parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning.PMLR,2021:11863-11874.
[28]AANæS H,JENSEN R R,VOGIATZIS G,et al.Large-scale data for multiple-view stereopsis[J].International Journal of Computer Vision,2016,120:153-168.
[29]KNAPITSCH A,PARK J,ZHOU Q Y,et al.Tanks and tem-ples:Benchmarking large-scale scene reconstruction[J].ACM Transactions on Graphics(ToG),2017,36(4):1-13.
[30]CAMPBELL N D F,VOGIATZIS G,HERNANDEZ C,et al.Using multiple hypotheses to improve depth-maps for multi-view stereo[C]//Computer Vision ECCV 2008:10th European Conference on Computer Vision,Marseille,France,October 1218,2008,Proceedings,Part I 10.Springer Berlin Heidelberg,2008:766-779.
[31]GALLIANI S,LASINGER K,SCHINDLER K.Gipuma:Mas-sively parallel multi-view stereo reconstruction[J/OL].https://www.dgpf.de/src/tagung/jt2016/proceedings/papers/34_DLT2016_Galliani_et_al.pdf.
[32]SCHONBERGER J L,FRAHM J M.Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4104-4113.
[33]CHEN R,HAN S,XU J,et al.Point-based multi-view stereonetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1538-1547.
[34]WEI Z,ZHU Q,MIN C,et al.Aa-rmvsnet:Adaptive aggregation recurrent multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:6187-6196.
[35]YI P,TANG S,YAO J.DDR-Net:Learning multi-stage multi-view stereo with dynamic depth range[J].arXiv:2103.14275,2021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!