计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 33-40.doi: 10.11896/jsjkx.240800069

• 三维视觉与元宇宙 • 上一篇    下一篇

LpDepth:基于拉普拉斯金字塔的自监督单目深度估计

曹明伟1, 邢景杰1, 程宜风2, 赵海锋1   

  1. 1 安徽大学计算机科学与技术学院 合肥 230601
    2 国网安徽省电力有限公司电力科学研究院 合肥 230601
  • 收稿日期:2024-08-13 修回日期:2024-09-19 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 曹明伟(cmwqq2008@163.com)
  • 基金资助:
    安徽省高校科研项目(2024AH050045);国家自然科学基金(62372153,62076005)

LpDepth:Self-supervised Monocular Depth Estimation Based on Laplace Pyramid

CAO Mingwei1, XING Jingjie1, CHENG Yifeng2, ZHAO Haifeng1   

  1. 1 School of Computer Science and Technology,Anhui University,Hefei 230601,China
    2 State Grid Anhui Electric Power Research Institute,Hefei 230601,China
  • Received:2024-08-13 Revised:2024-09-19 Online:2025-03-15 Published:2025-03-07
  • About author:CAO Mingwei,born in 1986,Ph.D,associate professor,master supervisor,is a member of CCF(No.49221M).His main research interests include 3D reconstruction and computer vision.
  • Supported by:
    Anhui Province University Research Project(2024AH050045)and National Natural Science Foundation of China(62372153,62076005).

摘要: 自监督单目深度估计受到了国内外研究人员的广泛关注。现有基于深度学习的自监督单目深度估计方法主要采用编码器-解码器结构。然而,这些方法在编码过程中对输入图像进行下采样操作,导致部分图像信息,尤其是图像的边界信息丢失,进而影响深度图的精度。针对上述问题,提出一种基于拉普拉斯金字塔的自监督单目深度估计方法(Self-supervised Monocular Depth Estimation Based on the Laplace Pyramid,LpDepth)。此方法的核心思想是:首先,使用拉普拉斯残差图丰富编码特征,以弥补在下采样过程中丢失的特征信息;其次,在下采样过程中使用最大池化层突显和放大特征信息,使编码器在特征提取过程中更容易地提取到训练模型所需要的特征信息;最后,使用残差模块解决过拟合问题,提高解码器对特征的利用效率。在KITTI和Make3D等数据集上对所提方法进行了测试,同时将其与现有经典方法进行了比较。实验结果证明了所提方法的有效性。

关键词: 单目深度估计, 拉普拉斯金字塔, 残差网络, 深度图

Abstract: Self-supervised monocular depth estimation has attracted widespread attention from researchers both domestically and abroad.Existing self-supervised monocular depth estimation methods based on deep learning mainly use encoder-decoder structures.However,these methods perform down-sampling operations on the input image during the encoding process,resulting in the loss of some image information,particularly boundary information,which leads to the degradation of the accuracy of the estimated depth map.To address this issue,this paper proposes a new self-supervised monocular depth estimation method based on the Laplacian pyramid.Specifically,the method enriches the encoded features using Laplacian residual images,compensates for the loss of information during down-sampling,highlights and amplifies features during the down-sampling process using maximum-pooling layers,which facilitates feature extraction for model training by the encoder.The method also leverages residual modules to mitigate potential overfitting issues and improve the decoder’s efficiency in feature utilization.Finally,we test the proposed method on benchmark datasets such as KITTI and Make3D and compare its performance with state-of-the-art methods,with experimental results demonstrating the effectiveness of the proposed method.

Key words: Monocular depth estimation, Laplacian pyramid, Residual networks, Depth map

中图分类号: 

  • TP391.41
[1]AGIA C,KRISHNA M,MOHAMED N.Evaluating robot task planning over large 3D scene graphs[C]//Conference on Robot Learning.2022:46-58.
[2]ZHANG Y,GONG M,LI J.Self-supervised monocular depth estimation with multiscale perception[J].IEEE Transactions on Image Processing,2022,31:3251-3266.
[3]EIGEN D,FERGUS V.Predicting depth,surface normals andsemantic labels with a common multi-scale convolutional architecture[C]//IEEE International Conference on Computer Vision.2014:2650-2658.
[4]SONG X,LI W,ZHOU D.MLDA-Net:multi-level dual attention-based network for self-supervised monocular depth estimation[J].IEEE Transactions on Image Processing,2021,30:4691-4705.
[5]ZHANG N,NEX F,VOSSELMAN G.Lite-Mono:a lightweight CNN and transformer architecture for self-supervised monocular depth estimation[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:18537-18546.
[6]CAI J,DONG F,SUN S,et al.Unsupervised Learning of Monocular Depth Estimation:A Survey[J].Computer Science,2024,51(2):117-134.
[7]MINSOO S,LIM S,KIM W.Monocular depth estimation using laplacian pyramid-based depth residuals[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(11):4381-4393.
[8]FU H,GONG M,WANG C.Recognition.deep ordinal regres-sion network for monocular depth estimation[C]//Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
[9]KIM D,GA S W,AHN P,et al.Global-Local path networks for monocular depth estimation with vertical cutdepth[J].arXiv:2201.07436,2022.
[10]PATIL V,SAKARIDIS C,LINIGER A.P3Depth:monoculardepth estimation with a piecewise planarity prior[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2022:1600-1611.
[11]YUAN W,GU X,DAI Z,et al.Neural window fully-connected CRFS for monocular depth estimation[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.2022:3916-3925.
[12]ELKERDAWY S,ZHANG H.Lightweight monocular depth estimation model by joint end-to-end filter pruning[C]//IEEE International Conference on Image Processing.2019:4290-4294.
[13]LI Z,CHEN Z,LIU X,et al.DepthFormer:exploiting long-range correlation and local information for accurate monocular depth estimation[J].Machine Intelligence Research,2022,20:837-854.
[14]RUDOLPH M B,DAWOUD Y,GULDENRING R,et al.Lightweight monocular depth estimation through guided decoding.International[C]//Conference on Robotics and Automation.2022:2344-2350.
[15]GODARD C,MACAODHA O.Unsupervised monocular depth estimation with left-right consistency[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:270-279.
[16]ZHAO C,ZHANG Y,POGGI M,et al.MonoViT:self-supervised monocular depth estimation with a vision transformer[C]//International Conference on 3D Vision.2022:668-678.
[17]YUAN J,ZHAO H,BU P,et al.Channel-Wise attention-based network for self-supervised monocular depth estimation[C]//International Conference on 3D Vision.2021:464-473.
[18]ZHOU H,GREENWOOD D,TAYLOR S.Self-supervised monocular depth estimation with internal feature fusion[J].arXiv:2110.09482,2021.
[19]PENG R,WANG R,LAI Y,et al.Excavating the potential capacity of self-supervised monocular depth estimation[C]//International Conference on Computer Vision.2021:15540-15549.
[20]HAN W,YIN J,SHEN J.Self-supervised monocular depth estimation by direction-aware cumulative convolution network[C]//International Conference on Computer Vision.2023:8613-8623.
[21]ZHOU Z M,DONG Q L.Learning occlusion-aware coarse-to-fine depth map for self-supervised monocular depth estimation[C]//Proceedings of the 30th ACM International Conference on Multimedia.2022:6386-6395.
[22]SAUDERS K,VOGIATZIS G,MANSO L J A.Self-supervised monocular depth estimation:let’s talk about the weather[C]//International Conference on Computer Vision.2023:8907-8917.
[23]GAO Q,PENG G,CHEN Z,et al.Monovim:Enhancing self-supervised monocular depth estimation via mamba[J].arXiv:2406.04532,2024
[24]YIN Z,SHI J.GeoNet:unsupervised learning of dense depth,optical flow and camera pose[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1983-1992.
[25]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:770-778.
[26]SIMONYAN K,ZISSERMAN A J C.Very deep convolutional networks for large-scale image recognition [J].arXiv:1409.1556,2014.
[27]EIGEN D,PUHUHRSCH C,FERGUS R.Depth map prediction from a single image using a multi-scale deep network[C]//NIPS2014.2014:27-38.
[28]UMMENHOFER B,ZHOU H,UGRIG J,et al.DeMoN:depth and motion network for learning monocular stereo[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5638-5047.
[29]LEE C Y,GALLAHGER P W,TU Z J A.Generalizing pooling functions in convolutional neural networks:mixed,gated,and tree[C]//Artificial Intelligence and Statistics.2016:464-472.
[30]RANFTL R,BOCHKOVSKIY A,KOLTUN V.Vision trans-formers for dense prediction [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:12179-12188.
[31]VARMA A,CHAWLA H,ZONOOZ B,et al.Transformers in self-supervised monocular depth estimation with unknown ca-mera intrinsics[J].arXiv:2202.03131,2022.
[32]RONNEBERGER O,FISCHER P,BROX T J A.U-Net:convolutional networks for biomedical image segmentation [C]//18th International Conference. Medical Image Computing and Computer-Assisted Intervention(MICCAI 2015).2015:234-241.
[33]GEIGER A,LENZ P,URTASUN V.Are we ready for autonomous driving? the kitti vision benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition.2012:3354-3361.
[34]SAXENA A,SUN M,NG A.Make3D:learning 3d scene structure from a single still image[C]//IEEE 11th International Conference on Computer Vision.IEEE,2007:1-8.
[35]GEIGER A,LENZ P,STILLER C,et al.The KITTI dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.
[36]ADAM P,GROSS S,CHINTALA S,et al.Automatic different-iation in pytorch[J].arXiv:22138.05524,2017.
[37]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization [J].arXiv:1711.05101,2017.
[38]LOSHCHILOV I,HUTTER F.SGDR:stochastic gradient descent with warm restarts [J].arXiv:1608.03983,2016.
[39]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet:largescale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[40]LYU X,LIU L,WANG M,et al.HR-Depth:high resolutionself-supervised monocular depth estimation [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:2294-2301.
[41]GODARD C,AODHA M O.Digging into self-supervised mono-cular depth estimation.international[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3828-3838.
[42]ZHOU Z,FAN X,SHI P,et al.R-MSFM:recurrent multi-scale feature modulation for monocular depth estimating[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:12777-12786.
[43]BAE J H,MOON S.MonoFormer:towards generalization ofself-supervised monocular depth estimation with transformers[J].arXiv:2205.11083,2022.
[44]WANG C,BUENAPOSADA J M,ZHU R.Learning depth from monocular videos using direct methods[C]//Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.2017:2022-2030.
[45]MARVIN K,TERMOHLEN A,MIKOLAJCZYK J,et al.Self-supervised monocular depth estimation:solving the dynamic object problem by semantic guidance[C]//European Conference on Computer Vision.2020:582-600.
[46]GUI Z,CAMPANHOLO V,AMBURS R,et al.PackNet-SfM:3D packing for self-supervised monocular depth estimation [J].arXiv:1905.02693,2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!