计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240900126-7.doi: 10.11896/jsjkx.240900126

• 图像处理&多媒体技术 • 上一篇    下一篇

TalentDepth:基于多尺度注意力机制的复杂天气场景单目深度估计模型

张航1, 卫守林2, 殷继彬2   

  1. 1 昆明理工大学信息工程与自动化学院 昆明 650550
    2 昆明理工大学信息工程与自动化学院 昆明 650550
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 殷继彬(yjblovelh@aliyun.com)
  • 作者简介:(20222204117@stu.kust.edu.cn)
  • 基金资助:
    国家自然科学基金(61741206)

TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism

ZHANG Hang1, WEI Shoulin2, YIN Jibin2   

  1. 1 Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650550,China
    2 Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650550,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:ZHANG Hang,born in 1999,postgraduate,is a student member of CCF(No.V2679G).His main research interests include deep estimation and deep lear-ning.
    YIN Jibin,born in 1976,Ph.D.His main research interests include human-computer interaction,deep learning,wearable devices,and computational intelligence.
  • Supported by:
    National Natural Science Foundation of China(61741206).

摘要: 对于复杂天气场景图像模糊、低对比度和颜色失真所导致的深度信息预测不准的问题,以往的研究均以标准场景的深度图作为先验信息来对该类场景进行深度估计。然而,这一方式存在先验信息精度较低等问题。对此,提出一个基于多尺度注意力机制的单目深度估计模型TalentDepth,以实现对复杂天气场景的预测。首先,在编码器中融合多尺度注意力机制,在减少计算成本的同时,保留每个通道的信息,提高特征提取的效率和能力。其次,针对图像深度不清晰的问题,基于几何一致性,提出深度区域细化(Depth Region Refinement,DSR)模块,过滤不准确的像素点,以提高深度信息的可靠性。最后,输入图像翻译模型所生成的复杂样本,并计算相应原始图像上的标准损失来指导模型的自监督训练。在NuScence,KITTI和KITTI-C这3个数据集上,相比于基线模型,所提模型对误差和精度均有优化。

关键词: 单目深度估计, 自监督学习, 多尺度注意力, 知识提炼, 深度学习

Abstract: For the problem of inaccurate prediction of depth information caused by blurred,low-contrast and color distortion of complex weather scene images,previous studies have used the depth map of a standard scene as the a priori information for depth estimation of such scenes.However,this approach suffers from problems such as low accuracy of a priori information.This paper proposed a monocular depth estimation model TalentDepth based on a multiscale attention mechanism to realize the prediction of complex weather scenes.First,the multiscale attention mechanism was fused in the encoder to reduce the computational cost while retaining the information of each channel to improve the efficiency and capability of feature extraction.Second,to address the problem of unclear image depth,based on geometric consistency,a Depth Region Refinement(DSR) module was proposed to filter inaccurate pixel points in order to improve the reliability of depth information.Finally,the complex samples generated by the image translation model are input and the standard loss on the corresponding original images is calculated to guide the self-supervised training of the model.On the three datasets,NuScence,KITTI and KITTI-C,the error and accuracy are optimized compared to the baseline model.

Key words: Monocular depth estimation, Self-supervised learning, Multiscale attention, Knowledge distillation, Deep learning

中图分类号: 

  • TP183
[1]LI A,HU A,XI W,et al.Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion[J].arXiv:2404.07545,2024.
[2]LI T,HU T,WU D D.Monocular depth estimation combining pyramid structure and attention mechanism[J].Journal of Graphics,2024,45(3):454.
[3]LI Y,SU J,LIU L,et al.Object Detection Based on the Fusion of Sparse LiDAR Point Cloud and Dense Stereo Pseudo Point Cloud[C]//2024 4th International Conference on Neural Networks,Information and Communication(NNICE).IEEE,2024:860-863.
[4]NGUYEN H C,WANG T,ALVAREZ J M,et al.Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2024:10446-10455.
[5]SAUNDERS K,VOGIATZIS G,MANSO L J.Self-supervised Monocular Depth Estimation:Let’s Talk About The Weather[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:8907-8917.
[6]LIN X,LI N.Self-supervised learning monocular depth estimation from internet photos[J].Journal of Visual Communication and Image Representation,2024,99:104063.
[7]LIU L,SONG X,WANG M,et al.Self-supervised monocular depth estimation for all day images using domain separation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:12737-12746.
[8]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[9]GODARD C,MAC AODHA O,FIRMAN M,et al.Digging into self-supervisedmonocular depth estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3828-3838.
[10]RANJAN A,JAMPANI V,BALLESL,et al.Competitive col-laboration:Joint unsupervised learning of depth,camera motion,optical flow and motion segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12240-12249.
[11]LEE S,RAMEAU F,PAN F,et al.Attentive and contrastivelearning for joint depth and motion field estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:4862-4871.
[12]FENG Z,YANG L,JING L,et al.Disentangling object motion and occlusion for unsupervised multi-frame monocular depth[C]//European Conference on Computer Vision.
[13]LI H,GORDON A,ZHAO H,et al.Unsupervised monoculardepth learning in dynamic scenes[C]//Conference on Robot Learning.PMLR,2021:1908-1917.
[14]HUI T W.Rm-depth:Unsupervised learning of recurrent mo-nocular depth in dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:1675-1684.
[15]DOSOVITSKIY A,FISCHER P,ILG E,et al.Flownet:Learning optical flow with convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2758-2766.
[16]ZOU Y,LUO Z,HUANG J B.Df-net:Unsupervised joint learning of depth and flow using cross-task consistency[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:36-53.
[17]VANK A M,GARG S,MAJUMDER A,et al.Unsupervisedmonocular depth estimation for night-time images using adversarial domain feature adaptation[C]//Computer Vision-ECCV 2020:16th European Conference,Glasgow,UK,Part XXVIII 16.Springer International Publishing,2020:443-459.
[18]GASPERINI S,MORBITZER N,JUNGH J,et al.Robust monocular depth estimation under challenging conditions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:8177-8186.
[19]WANG Z,BOVIK A C,SHEIKHH R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Transactions on Image Processing,2004,13(4):600-612.
[20]ZHOU T,BROWN M,SNAVELY N,et al.Unsupervised learning of depth and ego-motion from video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1851-1858.
[21]GASPERINI S,KOCH P,DALLABETTAV,et al.R4Dyn:Exploring radar for self-supervised monocular depth estimation of dynamic scenes[C]//2021 International Conference on 3D Vision(3DV).IEEE,2021:751-760.
[22]PITROPOV M,GARCIA D E,REBELLO J,et al.Canadian adverse driving conditions dataset[J].The International Journal of Robotics Research,2021,40(4/5):681-690.
[23]SAKARIDIS C,DAI D,VAN GOOLL.ACDC:The adverse conditions dataset with correspondences for semantic driving scene understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10765-10775.
[24]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[25]WANG K,ZHANG Z,YAN Z,et al.Regularizing nighttimeweirdness:Efficient self-supervised monocular depth estimation in the dark[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:16055-16064.
[26]WEI X,YE X,MEI X,et al.Enforcing high frequency enhancement in deep networks for simultaneous depth estimation and dehazing[J].Applied Soft Computing,2024,163:111873.
[27]ZHAO C,ZHANG Y,POGGI M,et al.Monovit:Self-supervised monocular depth estimation with a vision transformer[C]//2022 International Conference on 3D Vision(3DV).IEEE,2022:668-678.
[28]ZHOU H,GREENWOOD D,TAYLOR S.Self-supervised mo-nocular depth estimation with internal feature fusion[J].arXiv:2110.09482,2021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!