TalentDepth:基于多尺度注意力机制的复杂天气场景单目深度估计模型

doi:10.11896/jsjkx.240900126

Abstract

Abstract: For the problem of inaccurate prediction of depth information caused by blurred,low-contrast and color distortion of complex weather scene images,previous studies have used the depth map of a standard scene as the a priori information for depth estimation of such scenes.However,this approach suffers from problems such as low accuracy of a priori information.This paper proposed a monocular depth estimation model TalentDepth based on a multiscale attention mechanism to realize the prediction of complex weather scenes.First,the multiscale attention mechanism was fused in the encoder to reduce the computational cost while retaining the information of each channel to improve the efficiency and capability of feature extraction.Second,to address the problem of unclear image depth,based on geometric consistency,a Depth Region Refinement(DSR) module was proposed to filter inaccurate pixel points in order to improve the reliability of depth information.Finally,the complex samples generated by the image translation model are input and the standard loss on the corresponding original images is calculated to guide the self-supervised training of the model.On the three datasets,NuScence,KITTI and KITTI-C,the error and accuracy are optimized compared to the baseline model.

Key words: Monocular depth estimation, Self-supervised learning, Multiscale attention, Knowledge distillation, Deep learning

CLC Number:

TP183

ZHANG Hang, WEI Shoulin, YIN Jibin. TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism[J].Computer Science, 2025, 52(6A): 240900126-7.

References

[1]LI A,HU A,XI W,et al.Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion[J].arXiv:2404.07545,2024.
[2]LI T,HU T,WU D D.Monocular depth estimation combining pyramid structure and attention mechanism[J].Journal of Graphics,2024,45(3):454.
[3]LI Y,SU J,LIU L,et al.Object Detection Based on the Fusion of Sparse LiDAR Point Cloud and Dense Stereo Pseudo Point Cloud[C]//2024 4th International Conference on Neural Networks,Information and Communication(NNICE).IEEE,2024:860-863.
[4]NGUYEN H C,WANG T,ALVAREZ J M,et al.Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2024:10446-10455.
[5]SAUNDERS K,VOGIATZIS G,MANSO L J.Self-supervised Monocular Depth Estimation:Let’s Talk About The Weather[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:8907-8917.
[6]LIN X,LI N.Self-supervised learning monocular depth estimation from internet photos[J].Journal of Visual Communication and Image Representation,2024,99:104063.
[7]LIU L,SONG X,WANG M,et al.Self-supervised monocular depth estimation for all day images using domain separation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:12737-12746.
[8]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[9]GODARD C,MAC AODHA O,FIRMAN M,et al.Digging into self-supervisedmonocular depth estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3828-3838.
[10]RANJAN A,JAMPANI V,BALLESL,et al.Competitive col-laboration:Joint unsupervised learning of depth,camera motion,optical flow and motion segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12240-12249.
[11]LEE S,RAMEAU F,PAN F,et al.Attentive and contrastivelearning for joint depth and motion field estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:4862-4871.
[12]FENG Z,YANG L,JING L,et al.Disentangling object motion and occlusion for unsupervised multi-frame monocular depth[C]//European Conference on Computer Vision.
[13]LI H,GORDON A,ZHAO H,et al.Unsupervised monoculardepth learning in dynamic scenes[C]//Conference on Robot Learning.PMLR,2021:1908-1917.
[14]HUI T W.Rm-depth:Unsupervised learning of recurrent mo-nocular depth in dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:1675-1684.
[15]DOSOVITSKIY A,FISCHER P,ILG E,et al.Flownet:Learning optical flow with convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2758-2766.
[16]ZOU Y,LUO Z,HUANG J B.Df-net:Unsupervised joint learning of depth and flow using cross-task consistency[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:36-53.
[17]VANK A M,GARG S,MAJUMDER A,et al.Unsupervisedmonocular depth estimation for night-time images using adversarial domain feature adaptation[C]//Computer Vision－ECCV 2020:16th European Conference,Glasgow,UK,Part XXVIII 16.Springer International Publishing,2020:443-459.
[18]GASPERINI S,MORBITZER N,JUNGH J,et al.Robust monocular depth estimation under challenging conditions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:8177-8186.
[19]WANG Z,BOVIK A C,SHEIKHH R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Transactions on Image Processing,2004,13(4):600-612.
[20]ZHOU T,BROWN M,SNAVELY N,et al.Unsupervised learning of depth and ego-motion from video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1851-1858.
[21]GASPERINI S,KOCH P,DALLABETTAV,et al.R4Dyn:Exploring radar for self-supervised monocular depth estimation of dynamic scenes[C]//2021 International Conference on 3D Vision(3DV).IEEE,2021:751-760.
[22]PITROPOV M,GARCIA D E,REBELLO J,et al.Canadian adverse driving conditions dataset[J].The International Journal of Robotics Research,2021,40(4／5):681-690.
[23]SAKARIDIS C,DAI D,VAN GOOLL.ACDC:The adverse conditions dataset with correspondences for semantic driving scene understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10765-10775.
[24]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[25]WANG K,ZHANG Z,YAN Z,et al.Regularizing nighttimeweirdness:Efficient self-supervised monocular depth estimation in the dark[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:16055-16064.
[26]WEI X,YE X,MEI X,et al.Enforcing high frequency enhancement in deep networks for simultaneous depth estimation and dehazing[J].Applied Soft Computing,2024,163:111873.
[27]ZHAO C,ZHANG Y,POGGI M,et al.Monovit:Self-supervised monocular depth estimation with a vision transformer[C]//2022 International Conference on 3D Vision(3DV).IEEE,2022:668-678.
[28]ZHOU H,GREENWOOD D,TAYLOR S.Self-supervised mo-nocular depth estimation with internal feature fusion[J].arXiv:2110.09482,2021.

Related Articles 15

[1]	ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[2]	WANG Yicheng, NING Tai, LIU Xinyu, LUO Ye. Position-aware Based Multi-modality Lung Cancer Survival Prediction Method [J]. Computer Science, 2025, 52(6A): 240500089-8.
[3]	GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[4]	TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[5]	RAN Qin, RUAN Xiaoli, XU Jing, LI Shaobo, HU Bingqi. Function Prediction of Therapeutic Peptides with Multi-coded Neural Networks Based on Projected Gradient Descent [J]. Computer Science, 2025, 52(6A): 240800024-6.
[6]	FAN Xing, ZHOU Xiaohang, ZHANG Ning. Review on Methods and Applications of Short Text Similarity Measurement in Social Media Platforms [J]. Computer Science, 2025, 52(6A): 240400206-8.
[7]	YANG Jixiang, JIANG Huiping, WANG Sen, MA Xuan. Research Progress and Challenges in Forest Fire Risk Prediction [J]. Computer Science, 2025, 52(6A): 240400177-8.
[8]	WANG Chanfei, YANG Jing, XU Yamei, HE Jiai. OFDM Index Modulation Signal Detection Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900122-6.
[9]	ZOU Ling, ZHU Lei, DENG Yangjun, ZHANG Hongyan. Source Recording Device Verification Forensics of Digital Speech Based on End-to-End DeepLearning [J]. Computer Science, 2025, 52(6A): 240800028-7.
[10]	WANG Jiamin, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, HAO Xu, ZHANG Chao, FU Rongsheng. Review of Concrete Defect Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900137-12.
[11]	HAO Xu, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, WANG Jiamin, CHU Hongkun. Survey of Man-Machine Distance Detection Method in Construction Site [J]. Computer Science, 2025, 52(6A): 240700098-10.
[12]	CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7.
[13]	GAO Junyi, ZHANG Wei, LI Zelin. YOLO-BFEPS:Efficient Attention-enhanced Cross-scale YOLOv10 Fire Detection Model [J]. Computer Science, 2025, 52(6A): 240800134-9.
[14]	HUANG Hong, SU Han, MIN Peng. Small Target Detection Algorithm in UAV Images Integrating Multi-scale Features [J]. Computer Science, 2025, 52(6A): 240700097-5.
[15]	WANG Baohui, GAO Zhan, XU Lin, TAN Yingjie. Research and Implementation of Mine Gas Concentration Prediction Algorithm Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240400188-7.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0