Computer Science ›› 2025, Vol. 52 ›› Issue (3): 33-40.doi: 10.11896/jsjkx.240800069

• 3D Vision and Metaverse • Previous Articles     Next Articles

LpDepth:Self-supervised Monocular Depth Estimation Based on Laplace Pyramid

CAO Mingwei1, XING Jingjie1, CHENG Yifeng2, ZHAO Haifeng1   

  1. 1 School of Computer Science and Technology,Anhui University,Hefei 230601,China
    2 State Grid Anhui Electric Power Research Institute,Hefei 230601,China
  • Received:2024-08-13 Revised:2024-09-19 Online:2025-03-15 Published:2025-03-07
  • About author:CAO Mingwei,born in 1986,Ph.D,associate professor,master supervisor,is a member of CCF(No.49221M).His main research interests include 3D reconstruction and computer vision.
  • Supported by:
    Anhui Province University Research Project(2024AH050045)and National Natural Science Foundation of China(62372153,62076005).

Abstract: Self-supervised monocular depth estimation has attracted widespread attention from researchers both domestically and abroad.Existing self-supervised monocular depth estimation methods based on deep learning mainly use encoder-decoder structures.However,these methods perform down-sampling operations on the input image during the encoding process,resulting in the loss of some image information,particularly boundary information,which leads to the degradation of the accuracy of the estimated depth map.To address this issue,this paper proposes a new self-supervised monocular depth estimation method based on the Laplacian pyramid.Specifically,the method enriches the encoded features using Laplacian residual images,compensates for the loss of information during down-sampling,highlights and amplifies features during the down-sampling process using maximum-pooling layers,which facilitates feature extraction for model training by the encoder.The method also leverages residual modules to mitigate potential overfitting issues and improve the decoder’s efficiency in feature utilization.Finally,we test the proposed method on benchmark datasets such as KITTI and Make3D and compare its performance with state-of-the-art methods,with experimental results demonstrating the effectiveness of the proposed method.

Key words: Monocular depth estimation, Laplacian pyramid, Residual networks, Depth map

CLC Number: 

  • TP391.41
[1]AGIA C,KRISHNA M,MOHAMED N.Evaluating robot task planning over large 3D scene graphs[C]//Conference on Robot Learning.2022:46-58.
[2]ZHANG Y,GONG M,LI J.Self-supervised monocular depth estimation with multiscale perception[J].IEEE Transactions on Image Processing,2022,31:3251-3266.
[3]EIGEN D,FERGUS V.Predicting depth,surface normals andsemantic labels with a common multi-scale convolutional architecture[C]//IEEE International Conference on Computer Vision.2014:2650-2658.
[4]SONG X,LI W,ZHOU D.MLDA-Net:multi-level dual attention-based network for self-supervised monocular depth estimation[J].IEEE Transactions on Image Processing,2021,30:4691-4705.
[5]ZHANG N,NEX F,VOSSELMAN G.Lite-Mono:a lightweight CNN and transformer architecture for self-supervised monocular depth estimation[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:18537-18546.
[6]CAI J,DONG F,SUN S,et al.Unsupervised Learning of Monocular Depth Estimation:A Survey[J].Computer Science,2024,51(2):117-134.
[7]MINSOO S,LIM S,KIM W.Monocular depth estimation using laplacian pyramid-based depth residuals[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(11):4381-4393.
[8]FU H,GONG M,WANG C.Recognition.deep ordinal regres-sion network for monocular depth estimation[C]//Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
[9]KIM D,GA S W,AHN P,et al.Global-Local path networks for monocular depth estimation with vertical cutdepth[J].arXiv:2201.07436,2022.
[10]PATIL V,SAKARIDIS C,LINIGER A.P3Depth:monoculardepth estimation with a piecewise planarity prior[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2022:1600-1611.
[11]YUAN W,GU X,DAI Z,et al.Neural window fully-connected CRFS for monocular depth estimation[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.2022:3916-3925.
[12]ELKERDAWY S,ZHANG H.Lightweight monocular depth estimation model by joint end-to-end filter pruning[C]//IEEE International Conference on Image Processing.2019:4290-4294.
[13]LI Z,CHEN Z,LIU X,et al.DepthFormer:exploiting long-range correlation and local information for accurate monocular depth estimation[J].Machine Intelligence Research,2022,20:837-854.
[14]RUDOLPH M B,DAWOUD Y,GULDENRING R,et al.Lightweight monocular depth estimation through guided decoding.International[C]//Conference on Robotics and Automation.2022:2344-2350.
[15]GODARD C,MACAODHA O.Unsupervised monocular depth estimation with left-right consistency[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:270-279.
[16]ZHAO C,ZHANG Y,POGGI M,et al.MonoViT:self-supervised monocular depth estimation with a vision transformer[C]//International Conference on 3D Vision.2022:668-678.
[17]YUAN J,ZHAO H,BU P,et al.Channel-Wise attention-based network for self-supervised monocular depth estimation[C]//International Conference on 3D Vision.2021:464-473.
[18]ZHOU H,GREENWOOD D,TAYLOR S.Self-supervised monocular depth estimation with internal feature fusion[J].arXiv:2110.09482,2021.
[19]PENG R,WANG R,LAI Y,et al.Excavating the potential capacity of self-supervised monocular depth estimation[C]//International Conference on Computer Vision.2021:15540-15549.
[20]HAN W,YIN J,SHEN J.Self-supervised monocular depth estimation by direction-aware cumulative convolution network[C]//International Conference on Computer Vision.2023:8613-8623.
[21]ZHOU Z M,DONG Q L.Learning occlusion-aware coarse-to-fine depth map for self-supervised monocular depth estimation[C]//Proceedings of the 30th ACM International Conference on Multimedia.2022:6386-6395.
[22]SAUDERS K,VOGIATZIS G,MANSO L J A.Self-supervised monocular depth estimation:let’s talk about the weather[C]//International Conference on Computer Vision.2023:8907-8917.
[23]GAO Q,PENG G,CHEN Z,et al.Monovim:Enhancing self-supervised monocular depth estimation via mamba[J].arXiv:2406.04532,2024
[24]YIN Z,SHI J.GeoNet:unsupervised learning of dense depth,optical flow and camera pose[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1983-1992.
[25]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:770-778.
[26]SIMONYAN K,ZISSERMAN A J C.Very deep convolutional networks for large-scale image recognition [J].arXiv:1409.1556,2014.
[27]EIGEN D,PUHUHRSCH C,FERGUS R.Depth map prediction from a single image using a multi-scale deep network[C]//NIPS2014.2014:27-38.
[28]UMMENHOFER B,ZHOU H,UGRIG J,et al.DeMoN:depth and motion network for learning monocular stereo[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5638-5047.
[29]LEE C Y,GALLAHGER P W,TU Z J A.Generalizing pooling functions in convolutional neural networks:mixed,gated,and tree[C]//Artificial Intelligence and Statistics.2016:464-472.
[30]RANFTL R,BOCHKOVSKIY A,KOLTUN V.Vision trans-formers for dense prediction [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:12179-12188.
[31]VARMA A,CHAWLA H,ZONOOZ B,et al.Transformers in self-supervised monocular depth estimation with unknown ca-mera intrinsics[J].arXiv:2202.03131,2022.
[32]RONNEBERGER O,FISCHER P,BROX T J A.U-Net:convolutional networks for biomedical image segmentation [C]//18th International Conference. Medical Image Computing and Computer-Assisted Intervention(MICCAI 2015).2015:234-241.
[33]GEIGER A,LENZ P,URTASUN V.Are we ready for autonomous driving? the kitti vision benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition.2012:3354-3361.
[34]SAXENA A,SUN M,NG A.Make3D:learning 3d scene structure from a single still image[C]//IEEE 11th International Conference on Computer Vision.IEEE,2007:1-8.
[35]GEIGER A,LENZ P,STILLER C,et al.The KITTI dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.
[36]ADAM P,GROSS S,CHINTALA S,et al.Automatic different-iation in pytorch[J].arXiv:22138.05524,2017.
[37]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization [J].arXiv:1711.05101,2017.
[38]LOSHCHILOV I,HUTTER F.SGDR:stochastic gradient descent with warm restarts [J].arXiv:1608.03983,2016.
[39]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet:largescale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[40]LYU X,LIU L,WANG M,et al.HR-Depth:high resolutionself-supervised monocular depth estimation [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:2294-2301.
[41]GODARD C,AODHA M O.Digging into self-supervised mono-cular depth estimation.international[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3828-3838.
[42]ZHOU Z,FAN X,SHI P,et al.R-MSFM:recurrent multi-scale feature modulation for monocular depth estimating[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:12777-12786.
[43]BAE J H,MOON S.MonoFormer:towards generalization ofself-supervised monocular depth estimation with transformers[J].arXiv:2205.11083,2022.
[44]WANG C,BUENAPOSADA J M,ZHU R.Learning depth from monocular videos using direct methods[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.2017:2022-2030.
[45]MARVIN K,TERMOHLEN A,MIKOLAJCZYK J,et al.Self-supervised monocular depth estimation:solving the dynamic object problem by semantic guidance[C]//European Conference on Computer Vision.2020:582-600.
[46]GUI Z,CAMPANHOLO V,AMBURS R,et al.PackNet-SfM:3D packing for self-supervised monocular depth estimation [J].arXiv:1905.02693,2019.
[1] HOU Lei, LIU Jinhuan, YU Xu, DU Junwei. Review of Graph Neural Networks [J]. Computer Science, 2024, 51(6): 282-298.
[2] LUO Jinyan, CHANG Jun, WU Peng, XU Yan, LU Zhongkui. FMCW Radar Human Behavior Recognition Based on Residual Network [J]. Computer Science, 2023, 50(11A): 220800247-6.
[3] LAI Teng-fei, ZHOU Hai-yang, YU Fei-hong. Real-time Extend Depth of Field Algorithm for Video Processing [J]. Computer Science, 2022, 49(6A): 314-318.
[4] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[5] ZHU Wei, YI Yao, WANG Tu-qiang, ZHENG Ya-yu. Fast Coding Unit Partition Algorithm for Depth Maps [J]. Computer Science, 2019, 46(10): 286-294.
[6] DU Jin, CHEN Yun-hua, ZHANG Ling, MAI Ying-chao. Energy-efficient Facial Expression Recognition Based on Improved Deep Residual Networks [J]. Computer Science, 2018, 45(9): 303-307.
[7] LIU Yang, QI Chun, YANG Jing-yi. 2D-to-3D Conversion Algorithm for Badminton Video [J]. Computer Science, 2018, 45(8): 63-69.
[8] LV Li-zhi and QIANG Yan. Medical CT Image Enhancement Algorithm Based on Laplacian Pyramid and Wavelet Transform [J]. Computer Science, 2016, 43(11): 300-303.
[9] WU Shao-qun YUAN Hong-xing AN Peng CHENG Pei-hong. Dense Depth Map Reconstruction via Image Guided Second-order Total Generalized Variation [J]. Computer Science, 2015, 42(7): 314-319.
[10] LI Xing,ZHAO Yao,LIN Chun-yu and YAO Chao. Depth Map Coding Based on Wavelet Inter-subband Coefficients Prediction [J]. Computer Science, 2014, 41(10): 134-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!