计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230300045-6.doi: 10.11896/jsjkx.230300045
辛瑞, 张霄力, 彭侠夫, 陈锦文
XIN Rui, ZHANG Xiaoli, PENG Xiafu, CHEN Jinwen
摘要: 目前景象匹配算法多采用传统特征点匹配算法,算法流程由特征检测和特征匹配组成,对于弱纹理场景精度低,匹配成功率低。UFormer提出了一种端到端的方案,用于完成基于Transformer的特征提取和匹配操作,采用注意力机制提高算法应对弱纹理场景的能力。受U-Net架构的启发,UFormer在编码器-解码器结构的基础上由粗到细构建了图像亚像素级的映射关系。编码器采用self-cross attention交叠结构检测并提取图像对各尺度的相关特征,建立特征连接,进行下采样,用于粗粒度的匹配,提供初始位置。解码器上采样,恢复图像分辨率,融合每个尺度上的注意力特征映射,实现细粒度层面的匹配,并通过期望的方式将匹配结果细化至亚像素精度。引入真值单应性矩阵计算粗、细粒度匹配点对坐标的欧氏距离反馈损失,监督网络的学习。UFormer融合特征检测与特征匹配,结构更简单,在保证准确性的同时提高了实时性,在一定程度上具备应对弱纹理场景的能力。在收集的无人机飞行轨迹数据集上,相比SIFT,坐标精度提升了0.183个像素,匹配耗时缩短至0.106 s,对弱纹理场景图像的匹配成功率更高。
中图分类号:
[1]JIANG X Y,MA J Y,XIAO G B,et al.A review of multimodal image matching:Methods and applications[J].Information Fusion,2021,73:22-71. [2]LENG C C,ZHANG H,LI B,et al.Local feature descriptor for image matching:A survey[J].IEEE Access,2018,7:6424-6434. [3]MIAN A S,BENNAMOUN M,OWENS R,et al.Keypoint detection and local feature matching for textured 3D face recognition[J].International Journal of Computer Vision,2008,79(1):1-12. [4]LI J,ALLINSON N M.A comprehensive review of current local features for computer vision[J].Neurocomputing,2008,71 (10/11/12):1771-1787. [5]CHEN L,ROTTENSTEINER F,HEIPKE C,et al.Feature detection and description for image matching:from hand-crafted design to deep learning[J].Geo-spatial Information Science,2021,24(1):58-74. [6]SARLIN P E,DETONE D,MALISIEWICZ T,et al.Superglue:Learning feature matching with graph neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4938-4947. [7]RONNEBERGER O,FISCHER P,BROX T,et al.U-net:Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and compu-ter-assisted intervention.Springer,2015:234-241. [8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017. [9]SUN J,SHEN Z,WANG Y,et al.LoFTR:Detector-free local feature matching with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8922-8931. [10]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125. [11]LOWE G.Sift-the scale invariant feature transform[J].International Joural,2004,2(2):91-110. [12]BAY H,TUYTELAARS T,GOOL L,et al.Surf:Speeded up robust features[C]//European Conference on Computer Vision.Springer,2006:404-417. [13]SILPA-ANAN C,HARTLEY R.Optimised KD-trees for fast image descriptor matching[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2008:1-8. [14]CALONDER M,LEPETIT V,STRECHA C,et al.Binary robust independent elementary features[C]//Proceedings of the European Conference on Computer Vision:778-792. [15]VISWANATHAN D G.Features from accelerated segment test (fast)[C]//Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services.London,UK,2009:6-8. [16]RUBLEE E,RABAUD V,KONOLIGE K,et al.ORB:An efficient alternative to SIFT or SURF[C]//International Confe-rence on Computer Vision.IEEE:2011:2564-2571. [17]DETONE D,MALISIEWICZ T,RABINOVICH A,et al.Superpoint:Self-supervised interest point detection and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:224-236. [18]BARROSO-LAGUNA A,RIBA E,PONSA D,et al.Key.net:Keypoint detection by handcrafted and learned cnn filters[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5836-5844. [19]SHEN X L,WANG C,LI X,et al.Rf-net:An end-to-end image matching network based on receptive field[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:8132-8140. [20]LEE J,KIM B,CHO M S,et al.Self-Supervised Equivariant Learning for Oriented Keypoint Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4847-4857. [21]DUSMANU M,ROCCO,PAJDLA T,et al.D2-net:A trainable cnn for joint description and detection of local features[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:8092-8101. [22]ONO Y,TRULLS E,FUA P,et al.LF-Net:Learning local features from images[C]//Advances in Neural Information Processing Systems.2018. [23]REVAUD J,WEINIAEPFEL P,DE S C,et al.R2D2:repeatable and reliable detector and descriptor[J]arXiv:1906.06195,2019. [24]YIN J,LIU Q,MENG F,et al.STCDesc:Learning deep local descriptor using similar triangle constraint[J].Knowledge-based systems,2022(19):248. [25]TIAN Y,FAN B,WU F.L2-Net:Deep Learning of Discriminative Patch Descriptor in Euclidean Space[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017. [26]MISHCHUK A,MISHKIN D,RADENOVIC F,et al.Working hard to know your neighbor’s margins:Local descriptor learning loss[C]//Advances in Neural Information Processing Systems.2017. [27]DANG Z,DENG C,YANG X,et al.Nearest neighbor matching for deep clustering[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2021:13693-13702. [28]FISCHLER M A,BOLLES R C.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM 1981,24(6):381-395. [29]WANG Q,ZHANG J,YANG K,et al.MatchFormer:Interleaving Attention in Transformers for Feature Matching[J].ar-Xiv:2203.09645,2022. |
|