计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250300040-9.doi: 10.11896/jsjkx.250300040
贾宏君1, 张海龙3, 李敬国1, 张晖敏4, 韩成功4, 江鹤2,4
JIA Hongjun1, ZHANG Hailong3, LI Jingguo1, ZHANG Huimin4, HAN Chenggong4, JIANG He2,4
摘要: 在室外深度估计任务中,传统的基于U型网络的模型在特征提取与融合阶段往往忽略了不同特征间的相关性和差异性,未能充分利用特征间的交互信息。针对这一问题,提出了一种基于格拉姆矩阵注意力的室外单目深度估计方法。具体而言,首先利用格拉姆矩阵分解的特性,设计了特征间的相关性矩阵和差异性矩阵,从而增强了特征间的信息交互及表征能力。在此基础上,进一步将格拉姆矩阵注意力机制生成的掩码与卷积层提取的特征进行深度融合。通过结合注意力机制所关注的重要特征与卷积层所捕捉的精细细节,实现了特征表达的多样性和完整性。大量的实验结果表明,在室外场景数据集KITTI上,引入格拉姆矩阵注意力机制后,网络的性能得到了提升。所提方法δ1指标提高到0.880,绝对误差指标则下降至0.112。此外,在Make3D数据集上的测试结果也进一步验证了该模型的优越性,具体表现为绝对相对误差、均方根相对误差、均方根误差分别达到了0.318,3.174和7.163的优异水平。
中图分类号:
| [1]YANG R,DING Z,YANG J,et al.Simulation system of mine unmanned vehicle based on parallel control theory[J].Industry and Mine Automation,2022,48(11):180-183. [2]IZADINIA H,SHAN Q,SEITS S.IM2CAD[C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.IEEE Press,2017:2422-2431. [3]LADICKY L,SHI J,POLLEFEYS M.Pulling things out of perspective [C]//IEEE Conference on Computer Vision and Pattern Recognition.2014:89-96. [4]GEIGER A,LENZ P,STILLER C,et al.Vision meets robotics:The kitti dataset [J].The International Journal of Robotics Research,2013,32(11):1231-1237. [5]LUO Q,GUO T,YAO Y.Laser ranging cooperative target for Cubesa[J].Optics and Precision Engineering,2017,25(7):1705-1713. [6]CHENG Y,ZHAO Y,LUO Z,et al.Uncertainty evaluation analysis of surface features in surface structured light measurement[J].Optics and Precision Engineering,2022,30(17):2039-2049. [7]VARMA A,CHAWLA H,ZONOOZ B,et al.Transformers inSelf-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics[J].arXiv:2202.03131,2022. [8]NING Z,FRANCESCO N,GEORGE V.Norman Kerle[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2023:18537-18546. [9]XIANG X,WANG Y,ZHANG L,et al.Self-supervised Mono-cular Depth Estimation with Large Kernel Attention[J].arXiv:2409.17895,2024. [10]CHEN R,LUO H,ZHAO F,et al.Structure-Centric RobustMonocular Depth Estimation viaKnowledge Distillation[C]//Asian Conference on Computer Vision.Singapore:Springer,2025. [11]LIU J,GUO Z Y,PING P,et al.Channel Interaction and Transformer Depth Estimation Network:Robust Self-Supervised Depth Estimation Under Varied Weather Conditions[J].Sustainability,2024,16(20):9131. [12]GODARD C,MAC AODHA O,BROSTOW G J.Unsupervised monocular depth estimation with left-right consistency[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:270-279. [13]XIE J,GIRSHICK R,FARHADI A.Deep3d:Fully automatic2d-to-3d video conversion with deep convolutional neural networks[C]//European Conference on Computer Vision.Springer International Publishing.2016:842-857. [14]GARG R,BG V K,CARNEIRO G,et al.Unsupervised cnn for single view depth estimation:Geometry to the rescue[C]//European Conference on Computer Vision.Springer International Publishing.2016:740-756. [15]BADKI A,TROCCOLI A,KIM K,et al.Bi3D:Stereo depth estimation via binary classifications[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,IEEE Press:2020:1597-1605. [16]DU Q,LIU R,PAN Y,et al.Depth estimation with multi-resolution stereo matching[C]//IEEE Visual Communications and Image Processing.IEEE Press,2019:1-4. [17]JOHNSTON A,CARNEIRO G.Self-supervised monoculartrained depth estimation using self-attention and discrete dispa-rity volume[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Press,2020:4755-4764. [18]ZHOU T,BROWN M,SNAVELY N,et al.Unsupervised learning of depth and ego-motion from video[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:1851-1858. [19]VIJAYANARASIMHAN S,RICCO S,SCHMID C,et al.Sfm-net:Learning of structure and motion from video[J/OL].CoRR,2017,abs/1704.07804. [20]YIN Z,SHI J.Geonet:Unsupervised learning ofdense depth,optical flow and camera pose[C]//IEEE Conference on Computer Vision and Pattern Recognition.2018:1983-1992. [21]JANAI J,GUNEY F,RANJAN A,et al.Unsupervised learning of multi-frame optical flow with occlusions[C]//European Conference on Computer Vision.2018:690-706. [22]WAN Y,ZHAO Q,GUO C H,et al.Multi-sensor fusion self-supervised deep odometry and depth estimation[J].Remote Sen-sing,2022,14(5):1228. [23]MAHJOURIAN R,WICKE M,ANGELOVA A.Unsupervised learning of depth and ego-motion from monocular video using3D geometric constraints[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE,2018:5667-5675. [24]ZHANG X,LI C,WANG Y,et al.Light feld depth estimation for scene with occlusion[J].Control and Decision,2018,33(12):2122-2130. [25]MNIH V,HEESS N,GRAVES A.Recurrent models of visual attention[J].arXiv:1406.6247,2014. [26]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//International Conference on Learning Representations.2015. [27]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st InternationalConfe-rence on Neural Information Processing Systems.2017:6000-6010. [28]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:transformers for image recognition at scale[C]//International Conference on Learning Representations.OpenReview.net,2021. [29]ZHOU H,GREENWOOD D,TAYLOR S.Self-Supervised Monocular Depth Estimationwith Internal Feature Fusion[C]//British Machine Vision Conference 2021.BMVA Press,2021:378. [30]VARMA A,CHAWLA H,ZONOOZ B,et al.Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics[C]//Proceedings of the 17th International Joint Conference on Computer Vision.2022:758-769. [31]HAO C,ZHANG Y,POGGI M,et al.MonoViT:Self-Supervised Monocular Depth Estimation with a Vision Transformer[C]//International Conference on 3D Vision.Czech Republic,2022:668-678. [32]LEE Y,KIM J,WILLETTE J,et al.Mpvit:Multi-path visiontransformer for dense prediction[C]//IEEE Conference on Computer Vision and Pattern Recognition.2022:7287-7296. [33]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Trans.,2004,13(4):600-612. [34]GODARD C,MAC A O,FIRMAN M,et al.Digging into self-supervised monocular depth estimation[C]//IEEE International Conference on Computer Vision.2019:3828-3838. [35]EIGEN D,PUHRSCH C,FERGUS R.Depth map predictionfrom a single image using a multi-scale deep network[C]//NIPS’14,2014. [36]WONG A,HONG B W,SOATTO S.Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction[J].IEEE Conference on Computer Vision and Pattern Recognition.2019:5644-5653. [37]ZOU Y,LUO Z,HUANG J B.Df-net:Unsupervised joint lear-ning of depth and flow using cross-task consistency[C]//Proceedings of the European Conference on Computer Vision.2018:36-53. [38]RANJAN A,JAMPANI V,BALLES L,et al.Competitive col-laboration:Joint unsupervised learning of depth,camera motion,optical flow and motion segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2019:12240-12249. [39]CASSER V,PIRK S,MAHJOURIAN R,et al.Depth prediction without the sensors:Leveraging structure for unsupervised learning from monocular videos[C]//Proceedings of The AAAI Conference on Artificial Intelligence.2019:8001-8008. [40]ZHOU Z,FAN X,SHI P,et al.R-msfm:Recurrent multi-scale feature modulation formonocular depth estimating[C]//IEEE International Conference on Computer Vision.2021:12777-12786. [41]SURI Z K.Pose Constraints for Consistent Self-supervised Monocular Depth and Ego-Motion [C]//Scandinavian Conference on Image Analysis.Springer,2023:340-353. [42]BAE J,MOON S,IM S.Deep digging into the generalization of self-supervised monocular depthestimation[C]//36th AAAI Conference on Artificial Intelligence.2023:187-196. [43]WANG W,HAN J,ZOU X,et al.Research on Fast Monocular Depth Estimation Algorithm Based on Edge Devices[J].Computer Measurement & Control,2025,33(4):262-269. |
|
||