跨视角地理视觉定位

doi:10.11896/jsjkx.221100066

摘要/Abstract

摘要： 伴随着智能终端设备的爆炸性增长和移动互联网的迅速崛起,在许多场景下,例如地广人稀的偏远山区,基于位置的服务需求越来越凸显。但由于这些区域GPS信号遮挡或信号基站难以覆盖,GPS定位无法正常发挥作用。图像地理定位指仅根据视觉信息确定图像的拍摄位置。在没有任何先验知识的情况下,预测照片的地理位置是一项非常艰巨的任务,因为不同条件下(例如,不同的天气,物体或相机设置)拍摄的图像会呈现出巨大的变化。文中尝试探索图像的跨视角地理视觉定位方法,首先利用逆极坐标转换将街景视角转换为空域视角图像,以此减少两者间的域差异,再利用深度学习的方法来对不同视角的图像进行编码以获得更加鲁棒的图像全局向量描述子,然后在此基础之上进行图像匹配和街景视角查询图像的定位。在图像特征提取方面,采用了VGG16模型,利用层数更深的小卷积核的方式去增大网络模型的感受视野并节省参数。在特征编码方面,将多尺度注意力机制融入NetVLAD模型,将骨架模型提取到的特征编码成更加鲁棒的全局特征描述子向量。实验结果显示,上述方法能够实现较高精度的街景视角的匹配与定位,同目前已有的方法相比,匹配精度更高。而且无须专业设备采集的高清街景视图,普通智能手机拍摄的街景视图即可获得较好的匹配定位精度。

关键词: 跨视角定位, 逆极坐标系转换, NetVLAD, 多尺度注意力

Abstract: With the explosive growth of smart terminal equipment and the rapid rise of mobile Internet,in many scenarios,such as indoor environments and remote mountainous areas with sparse population,the demand for location-based services has become more and more prominent.However,because GPS signals in these areas are blocked or the signal base stations are difficult tocover,GPS location can not working properly.Image based geo-location refersto determine the location of an image based only on visual information.Without any prior knowledge,predicting the geographic location of a photo is a very difficult task,because the images taken from the earth will show huge changes with different weather,objects or camera settings.This paper attempts to explore the cross-view geo-localization method.First,the inverse polar coordinate transformation is used to convert the street view perspective to the spatial perspective image,so as to reduce the domain gap between the two.Then deep learning is used to encode images from different perspectives to obtain more robust global vector descriptors.Finally,performing image matching on this basis.In the aspect of image feature extraction,the VGG16 model is adopted,and a smaller convolution kernel with deeper layers is used to increase the perception field of the network model and save parameters.In terms of feature encoding,the multi-scale attention mechanism is integrated into the NetVLAD model,and the features extracted from the backbone model are encoded into a more robust global feature descriptor vector.Experimental results show that the above-mentioned method can achieve higher accuracy,compared with the existing methods.And without the high-definition street view captured by professional equipment,the street view captured by ordinary smart phones can obtain good matching accuracy.

Key words: Cross-view geo-localization, Inverse polar transform, NetVLAD, Multi-scale attention

中图分类号:

TP391

刘旭东, 余平. 跨视角地理视觉定位[J]. 计算机科学, 2023, 50(11A): 221100066-7. https://doi.org/10.11896/jsjkx.221100066

LIU Xudong, YU Ping. Cross-view Geo-visual Localization[J]. Computer Science, 2023, 50(11A): 221100066-7. https://doi.org/10.11896/jsjkx.221100066

参考文献

[1]ZHANG N.Research on image matching algorithm in indoorvisual location [D].Shenyang:Shenyang University of Techno-logy,2020.
[2]MCMANUS C,CHURCHILL W,MADDERN W,et al.Shady dealings:Robust,long-term visual localisation using illumination invariance[C]//2014 IEEE International Conference on Robo-tics and Automation(ICRA).IEEE,2014:901-906.
[3]LIAO C J,HOU Y K,XIN L.Research on the Operation andService Mechanism of China’s High Resolution Remote Sensing Application Satellite [J].Satellite Applications,2014(2):57-61.
[4]MIDDELBERG S,SATTLER T,UNTZELMANN O,et al.Scalable 6-dof localization on mobile devices[C]//European Conference on Computer Vision.Cham:Springer,2014:268-283.
[5]BANSAL M,SAWHNEY H S,CHENG H,et al.Geo-localization of street views with aerial image databases[C]//Procee-dings of the 19th ACM International Conference on Multimedia.2011:1125-1128.
[6]LI S.Indoor positioning system based on position feature image detection [D].Wuhan:Huazhong University of Science and Technology,2019.
[7]WORKMAN S,JACOBS N.On the location dependence of convolutional neural network features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2015:70-78.
[8]VO N N,HAYS J.Localizing and orienting street views usingoverhead imagery[C]//European Conference on Computer Vision.Cham:Springer,2016:494-509.
[9]HU S,FENG M,NGUYEN R M H,et al.CVM-net:Cross-view matching network for image-based ground-to-aerial geo-localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7258-7267.
[10]CAI S,GUO Y,KHAN S,et al.Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:8391-8400.
[11]SUN B,CHEN C,ZHU Y,et al.GEOCAPSNET:Ground to aerial view image geo-localization using capsule network[C]//2019 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2019:742-747.
[12]REGMI K,SHAH M.Bridging the domain gap for ground-to-aerial image matching[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:470-479.
[13]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[14]ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5297-5307.
[15]HORNIK K,STINCHCOMBE M,WHITE H.Multilayer feedforward networks are universal approximators[J].Neural Networks,1989,2(5):359-366.
[16]WORKMAN S,SOUVENIR R,JACOBS N.Wide-area imagegeolocalization with aerial reference imagery[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:3961-3969.
[17]GE Y,WANG H,ZHU F,et al.Self-supervising fine-grained region similarities for large-scale image localization[C]//European Conference on Computer Vision.Cham:Springer,2020:369-386.
[18]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[19]LIU L,LI H.Lending orientation to neural networks for cross-view geo-localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5624-5633.
[20]ZHENG Z,WEI Y,YANG Y.University-1652:A multi-viewmulti-source benchmark for drone-based geo-localization[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1395-1403.
[21]SHI Y,YU X,LIU L,et al.Optimal feature transport for cross-view image geo-localization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(7):11990-11997.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed