计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 86-94.doi: 10.11896/jsjkx.240500020

• 三维视觉与元宇宙 • 上一篇    下一篇

跨视角地理定位中的三维交互机制

周博文, 李阳, 王家宝, 苗壮, 张睿   

  1. 陆军工程大学指挥控制工程学院 南京 210007
  • 收稿日期:2024-05-07 修回日期:2024-10-14 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 李阳(solarleeon@outlook.com)
  • 作者简介:(solarleeon@outlook.com)
  • 基金资助:
    江苏省自然科学基金(BK20200581)

Triplet Interaction Mechanism in Cross-view Geo-localization

ZHOU Bowen, LI Yang, WANG Jiabao, MIAO Zhuang, ZHANG Rui   

  1. College of Command and Control Engineering ,Army Engineering University of PLA,Nanjing 210007,China
  • Received:2024-05-07 Revised:2024-10-14 Online:2025-03-15 Published:2025-03-07
  • About author:ZHOU Bowen,born in 1999,postgra-duate.His main research interests include deep learning and cross-view geo-localization.
    LI Yang,born in 1984,Ph.D,associate professor,is a senior member of CCF(No.D24215).His main research intere-sts include computer vision and image processing.
  • Supported by:
    Natural Science Foundation of Jiangsu Province,China(BK20200581).

摘要: 跨视角地理定位是一种图像检索任务,其目的是在不同视角下使用无地理坐标的图像与数据库中有地理坐标的图像进行检索匹配,从而获取目标图像的地理位置信息。然而,现有方法大多忽略了全局位置信息和特征完整性,导致模型无法捕获深层语义信息;另外,现有的二维交互方式未充分利用维度间关系,导致跨维交互不充分。为解决上述问题,设计了一种跨视角地理定位三维交互机制。该方法利用ConvNeXt作为特征提取网络,随后使用所提出的三维交互机制(Triplet Interaction Mechanism,TIM)进行特征丰富操作,最后利用联合损失函数指导模型训练。所提方法在模型内进行了多次三维交互,缓解了二维特征投影部分信息丢失的问题。同时,所提出的三维交互机制在3个通道中使用不同的注意力,使模型对跨视角图像的平移、缩放、旋转具有鲁棒性。实验结果表明,所提方法在University-1652数据集上针对无人机视角定位和无人机导航两个任务均取得了最优性能。

关键词: 跨视角, 地理定位, 交互机制, 特征注意力

Abstract: Cross-view geo-localization refers to inferring the geographical location from images of different viewpoints,which is usually viewed as an image retrieval task.However,most existing methods neglect the global position information and feature completeness,which makes the model can not conducive to capturing deep semantic information.Additionally,the current two-dimensional interaction methods do not fully utilize the relationships between dimensions,leading to insufficient cross-dimensional interaction.To address these issues,this paper designs a triplet interaction mechanism for cross-view geo-localization.This method uses ConvNeXt as the feature extraction network,followed by a proposed triplet interaction mechanism,for feature enrichment operations.Finally,a joint loss function is utilized to guide model training.It performs multiple dimensional interactions within the model,reducing the problem of information loss in the two-dimensional feature projection.The proposed method includes a triplet interaction mechanism that uses different attention mechanisms in three channels,making the model robust to translations,scaling,and rotations for different cross-view images.Experimental results demonstrate that the proposed method can significantly outperforms other methods for both drone view localization and drone navigation tasks on University-1652 dataset.

Key words: Cross-view, Geo-localization, Interaction mechanism, Feature attention

中图分类号: 

  • TP391
[1]LIN J,ZHENG Z,ZHONG Z,et al.Joint representation learning and keypoint detection for cross-view geo-localization[J].IEEE Transactions on Image Processing,2022,31:3780-3792.
[2]SHEN T R,WEI Y M,KANG L,et al.Mccg:a ConvNeXt-based multiple-classifier method for cross-view geo-localization[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,34(3):1456-1468.
[3]ZHU S,SHAH M,CHEN C.Transgeo:Transformer is all you need for cross-view image geo-localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:1162-1171.
[4]ZHENG Z,WEI Y,YANG Y.University-1652:a multi-viewmulti-source benchmark for drone-based geo-localization[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1395-1403.
[5]CHEN D,KRÄHENBüHL P.Learning from all vehicles[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:17222-17231.
[6]PENG T,LI Q,ZHU P.Rgb-t crowd counting from drone:abenchmark and mmccn network[C]//Proceedings of the Asian Conference on Computer Vision.2020.
[7]LUO H Y,CHEN T X,LI X J,et al.Keepedge:a knowledge distillation empowered edge intelligence framework for visual assisted positioning in UAV delivery[J].IEEE Transactions on Mobile Computing,2022,22(8):4729-4741.
[8]SHUGAEV M,SEMENOV I,ASHLEY K,et al.Arcgeo:localizing limited field-of-view images using cross-view matching[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2024:209-218.
[9]WORKMAN S,JACOBS N.On the location dependence of convolutional neural network features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2015:70-78.
[10]SUN Y X,YE Y M,KANG J,et al.Cross-view object geo-localization in a local region with satellite imagery[J].IEEE Transactions on Geoscience and Remote Sensing,2023,61:1-16.
[11]WANG T,ZHENG Z,YAN C,et al.Each part matters:local patterns facilitate cross-view geo-localization[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(2):867-879.
[12]LIU L,LI H.Lending orientation to neural networks for cross-view geo-localization[C]//Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,USA:IEEE,2019:5624-5633.
[13]LOWE D G.Object recognition from local scale-invariant features[C]//Proceedings of the seventh IEEE International Conference on Computer Vision.IEEE,1999,2:1150-1157.
[14]TIAN Y,CHEN C,SHAH M.Cross-view image matching for geo-localization in urban environments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:3608-3616.
[15]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[16]LIN T Y,CUI Y,BELONGIE S,et al.Learning deep representations for ground-to-aerial geolocalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5007-5015.
[17]HU S,FENG M,NGUYEN R M H,et al.Cvm-net:cross-view matching network for image-based ground-to-aerial geo-localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7258-7267.
[18]SHI Y,YU X,LIU L,et al.Optimal feature transport for cross-view image geo-localization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11990-11997.
[19]YANG H,LU X,ZHU Y.Cross-view geo-localization with la-yer-to-layer Transformer[J].Advances in Neural Information Processing Systems,2021,34:29009-29020.
[20]ZHU R,YANG M,YIN L,et al.UAV’s status is worth consi-dering:A fusion representations matching method for geo-localization[J].Sensors,2023,23(2):720.
[21]HE K,ZHANG X,REN S,et al.Identity mappings in deep residual networks[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,October 11-14,2016,Proceedings,Part IV 14.Springer International Publishing,2016:630-645.
[22]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[23]ZHANG X,LI X,SULTANI W,et al.Cross-view geo-localization via learning disentangled geometric layout correspondence[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:3480-3488.
[24]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[25]LI S,BAK S,CARR P,et al.Diversity regularized spatiotemporal attention for video-based person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:369-378.
[26]XU J,ZHAO R,ZHU F,et al.Attention-aware compositional network for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2119-2128.
[27]FU J,ZHENG H,MEI T.Look closer to see better:recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4438-4446.
[28]CAI S,GUO Y,KHAN S,et al.Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:8391-8400.
[29]ZHUANG J,DAI M,CHEN X,et al.A faster and more effective cross-view matching method of UAV and satellite images for UAV geolocalization[J].Remote Sensing,2021,13(19):3979.
[30]ZHUANG J,CHEN X,DAI M,et al.A semantic guidance and Transformer-based matching method for UAVs and satellite Images for UAV Geo-Localization[J].IEEE Access,2022,10:34277-34287.
[31]ZHU Y,YANG H,LU Y,et al.Simple,effective and general:a new backbone for cross-view image geo-localization[J].arXiv:2302.01572,2023.
[32]LIU Z,MAO H,WU C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11976-11986.
[33]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[34]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[35]MISRA D,NALAMADA T,ARASANIPALAI A U,et al.Rotate to attend:convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:3139-3148.
[36]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[37]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:510-519.
[38]ZENG W,WANG T,CAO J,et al.Clustering-guided pairwise metric triplet loss for person reidentification[J].IEEE Internet of Things Journal,2022,9(16):15150-15160.
[39]HE K,ZHANG X,REN S,et al.Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1026-1034.
[40]DAI M,HU J,ZHUANG J,et al.A Transformer-based feature segmentation and region alignment method for UAV-view geo-localization[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(7):4376-4389.
[41]BUI D V,KUBO M,SATO H.A part-aware attention neural network for cross-view geo-localization between UAV and satellite[J].Journal of Robotics,Networking and Artificial Life,2022,9(3):275-284.
[42]WANG T,ZHENG Z,ZHU Z,et al.Learning cross-view geo-localization embeddings via dynamic weighted decorrelation regularization[J].arXiv:2211.05296,2022.
[43]TIAN X,SHAO J,OUYANG D,et al.UAV-satellite view synthesis for cross-view geo-localization[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(7):4804-4815.
[44]CHEN Q,WANG T,YANG Z,et al.Sdpl:shifting-dense partition learning for UAV-view geo-localization[J].arXiv:2403.04172,2024.
[45]SONG H,WANG Z,LEI Y,et al.Learning visual representation clusters for cross-view geo-location[J].IEEE Geoscience and Remote Sensing Letters,2023,20:1-5.
[46]WANG Y P,LI Y,WANG J B,et al.A robust lightweight deep learning method for remote sensing scene image classification and retrieval under label noise[J].Journal of Image and Gra-phics,2021,26(12):2991-3004.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!