融合上下文引导代价体和深度细化的多视图立体重建

doi:10.11896/jsjkx.231200111

Abstract

Abstract: In response to the challenges in deep learning-based multi-view stereo(MVS) reconstruction algorithms,which include incomplete image feature extraction,ambiguous cost volume matching,and the accumulation of depth errors leading to poor reconstruction results in textureless and repetitive texture regions,a cascaded MVS network based on context-guided cost volume construction and depth refinement is proposed.First,the feature fusion module based on non-reference attention is used to filter out irrelevant features and address the inconsistency in multi-scale features through feature fusion.Then,the context-guided cost vo-lume module is used to fuse global information to enhance the accuracy and robustness of cost volume matching.Finally,the depth refinement module is employed to learn and reduce depth errors,to improve the accuracy of the low-resolution depth maps.The experimental results show that compared with MVSNet,the integrity error of the network on the DTU dataset is reduced by 24.4%,the accuracy error is reduced by 4.1 %,and the overall error is reduced by 14.3 %.The performance on the Tanks and Temples dataset is also better than most algorithms,showing strong competitiveness.

Key words: Multi-view stereo, Feature fusion, Context-guide, Cost volume matching, Depth refinement

CLC Number:

TP391

CHEN Guangyuan, WANG Zhaohui, CHENG Ze. Multi-view Stereo Reconstruction with Context-guided Cost Volume and Depth Refinemen[J].Computer Science, 2025, 52(3): 231-238.

References

[1]WANG X,WANG C,LIU B,et al.Multi-view stereo in the deeplearning era:A comprehensive review[J].Displays,2021,70:102102.
[2]FURUKAWA Y,HERNANDEZ C.Multi-view stereo:A tuto-rial[J].Foundations and Trends© in Computer Graphics and Vision,2015,9(1/2):1-148.
[3]GU J,WANG Z,KUEN J,et al.Recent advances in convolu-tional neural networks[J].Pattern Recognition,2018,77:354-377.
[4]YAO Y,LUO Z,LI S,et al.Mvsnet:Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:767-783.
[5]GU X,FAN Z,ZHU S,et al.Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2020:2495-2504.
[6]YANG J,MAO W,ALVAREZ J M,et al.Cost volume pyramid based depth inference for multi-viewstereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4877-4886.
[7]CHENG S,XU Z,ZHU S,et al.Deep stereo using adaptive thin volume representation with uncertainty awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:2524-2534.
[8]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature p yramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[9]HARIS M,SHAKHNAROVICH G,UKITA N.Deep back-pro-jection networks for super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:16641673.
[10]SINHA S N,MORDOHAI P,POLLEFEYS M.Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh[C]//2007 IEEE 11th International Conference on Computer Vision.IEEE,2007:1-8.
[11]FURUKAWA Y,PONCE J.Carved visual hulls for image-based modeling[C]//Computer Vision－ECCV 2006:9th European Conference on Computer Vision,Graz,Austria,May 7-13,2006.Proceedings,Part I 9.Springer Berlin Heidelberg,2006:564-577.
[12]SCHONBERGER J L,FRAHM J M.Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4104-4113.
[13]GALLIANI S,LASINGER K,SCHINDLER K.Massively pa-rallel multiview stereopsis by surface normal diffusion[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:873-881.
[14]CAMPBELL N D F,VOGIATZIS G,HERNANDEZ C,et al.Using multiple hypotheses to improve depth- maps for multi-view stereo[C]//Computer Vision ECCV 2008:10th European Conference on Computer Vision,Marseille,France,October 12－18,2008,Proceedings,Part I 10.Springer Berlin Heidelberg,2008:766-779.
[15]TOLA E,STRECHA C,FUA P.Efficient large-scalemulti-view stereo for ultra high-resolution image sets[J].Machine Vision and Applications,2012,23:903-920.
[16]KANG S B,SZELISKI R,CHAI J.Handling occlusions in dense multi-view stereo[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.CVPR 2001.IEEE,2001.
[17]JI M,GALL J,ZHENG H,et al.Surfacenet:An end-to-end 3d neural network for multiview stereopsis[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2307-2315.
[18]YAO Y,LUO Z,LI S,et al.Recurrent mvsnet for high-resolution multi-view stereo depth inference[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and PaRtern recognition.2019:5525-5534.
[19]YU Z,GAO S.Fast-mvsnet:Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1949-1958.
[20]WANG F,GALLIANI S,VOGEL C,et al.Patchmatchnet:Learned multi-view patchmatch stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:14194-14203.
[21]PENG R,WANG R,WANG Z,et al.Rethinking depth estimation for multi-view stereo:A unified representation[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2022:8645-8654.
[22]MI Z,DI C,XU D.Generalized binary search network for highly-efficient multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12991-13000.
[23]CAO C,REN X,FU Y.Mvsformer:Learning robust image re-presentations via transformers and temperature-based depth for multi-view stereo[J].arXiv:2208.02541,2022.
[24]Ding Y,YUAN W,Zhu Q,et al.Transmvsnet:Globalcontext-aware multi-view stereo network withtransformers[C]//Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition.2022:8585-8594.
[25]MA X,GONG Y,WANG Q,et al.Epp-mvsnet:Epipolar assembling based depth prediction for multi-view stereo[C]//Procee-dings of the IEEE/CVF International Conference on Computer Vision.2021:5732-5740.
[26]LUO A,YANG F,LI X,et al.Learning optical flow with kernel patch attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:8906-8915.
[27]YANG L,ZHANG R Y,LI L,et al.Simam:A simple,parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning.PMLR,2021:11863-11874.
[28]AANæS H,JENSEN R R,VOGIATZIS G,et al.Large-scale data for multiple-view stereopsis[J].International Journal of Computer Vision,2016,120:153-168.
[29]KNAPITSCH A,PARK J,ZHOU Q Y,et al.Tanks and tem-ples:Benchmarking large-scale scene reconstruction[J].ACM Transactions on Graphics(ToG),2017,36(4):1-13.
[30]CAMPBELL N D F,VOGIATZIS G,HERNANDEZ C,et al.Using multiple hypotheses to improve depth-maps for multi-view stereo[C]//Computer Vision ECCV 2008:10th European Conference on Computer Vision,Marseille,France,October 1218,2008,Proceedings,Part I 10.Springer Berlin Heidelberg,2008:766-779.
[31]GALLIANI S,LASINGER K,SCHINDLER K.Gipuma:Mas-sively parallel multi-view stereo reconstruction[J/OL].https://www.dgpf.de/src/tagung/jt2016/proceedings/papers/34_DLT2016_Galliani_et_al.pdf.
[32]SCHONBERGER J L,FRAHM J M.Structure-from-motion revisited[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:4104-4113.
[33]CHEN R,HAN S,XU J,et al.Point-based multi-view stereonetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1538-1547.
[34]WEI Z,ZHU Q,MIN C,et al.Aa-rmvsnet:Adaptive aggregation recurrent multi-view stereo network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:6187-6196.
[35]YI P,TANG S,YAO J.DDR-Net:Learning multi-stage multi-view stereo with dynamic depth range[J].arXiv:2103.14275,2021.

Related Articles 15

[1]	DUAN Pengting, WEN Chao, WANG Baoping, WANG Zhenni. Collaborative Semantics Fusion for Multi-agent Behavior Decision-making [J]. Computer Science, 2026, 53(1): 252-261.
[2]	ZHANG Xiaomin, ZHAO Junzhi, HE Hongjie. Screen-shooting Resilient Watermarking Method for Document Image Based on Attention Mechanism [J]. Computer Science, 2026, 53(1): 413-422.
[3]	FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
[4]	LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[5]	GUO Husheng, ZHANG Xufei, SUN Yujie, WANG Wenjian. Continuously Evolution Streaming Graph Neural Network [J]. Computer Science, 2025, 52(8): 118-126.
[6]	LUO Xuyang, TAN Zhiyi. Knowledge-aware Graph Refinement Network for Recommendation [J]. Computer Science, 2025, 52(7): 103-109.
[7]	LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[8]	XU Yongwei, REN Haopan, WANG Pengfei. Object Detection Algorithm Based on YOLOv8 Enhancement and Its Application Norms [J]. Computer Science, 2025, 52(7): 189-200.
[9]	FANG Chunying, HE Yuankun, WU Anxin. Emotion Recognition Based on Brain Network Connectivity and EEG Microstates [J]. Computer Science, 2025, 52(7): 201-209.
[10]	XU Yutao, TANG Shouguo. Visual Question Answering Integrating Visual Common Sense Features and Gated Counting Module [J]. Computer Science, 2025, 52(6A): 240800086-7.
[11]	WANG Rui, TANG Zhanjun. Multi-feature Fusion and Ensemble Learning-based Wind Turbine Blade Defect Detection Method [J]. Computer Science, 2025, 52(6A): 240900138-8.
[12]	LI Mingjie, HU Yi, YI Zhengming. Flame Image Enhancement with Few Samples Based on Style Weight Modulation Technique [J]. Computer Science, 2025, 52(6A): 240500129-7.
[13]	WANG Rong , ZOU Shuping, HAO Pengfei, GUO Jiawei, SHU Peng. Sand Dust Image Enhancement Method Based on Multi-cascaded Attention Interaction [J]. Computer Science, 2025, 52(6A): 240800048-7.
[14]	JIN Lu, LIU Mingkun, ZHANG Chunhong, CHEN Kefei, LUO Yaqiong, LI Bo. Pedestrian Re-identification Based on Spatial Transformation and Multi-scale Feature Fusion [J]. Computer Science, 2025, 52(6A): 240800156-7.
[15]	ZHANG Yongyu, GUO Chenjuan, WEI Hanyue. Deep Learning Stock Price Probability Prediction Based on Multi-modal Feature Wavelet Decomposition [J]. Computer Science, 2025, 52(6A): 240600140-11.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multi-view Stereo Reconstruction with Context-guided Cost Volume and Depth Refinemen

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0