Computer Science ›› 2023, Vol. 50 ›› Issue (11A): 230300045-6.doi: 10.11896/jsjkx.230300045

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

UFormer:An End-to-End Feature Point Scene Matching Algorithm Based on Transformer and U-Net

XIN Rui, ZHANG Xiaoli, PENG Xiafu, CHEN Jinwen   

  1. School of Aerospace Engineering,Xiamen University,Xiamen,Fujian 361005,China
  • Published:2023-11-09
  • About author:XIN Rui,born in 1998,is a master student.His main research interests include scene matching and computer vision navigation.
    ZHANG Xiaoli,born in 1970,Ph.D,associate professor.His main research interests inlcude theoretical analysis of nonlinear systems,deep learning,integrated navigation.
  • Supported by:
    Aeronautical Science Foundation of China(201958068002).

Abstract: At present,most scene matching algorithms use traditional feature point matching algorithms.The algorithm process consists of feature detection and feature matching.For weak texture scenes,both the accuracy and matching success rate are low.UFormer proposes an end-to-end solution to complete Transformer-based feature extraction and matching operations,and uses an attention mechanism to improve the algorithm’s ability to deal with weak texture scenes.Inspired by the U-Net architecture,UFormer constructs the sub-pixel-level mapping relationship of images from coarse to fine based on the encoder-decoder structure.The encoder uses the self-cross attention overlapping structure to detect and extract the relevant features of each scale of the image,establish feature connections,and perform down-sampling for coarse-grained matching to provide the initial position.The decoder upsamples,restores image resolution,fuses attentional feature maps at each scale,achieves matching at a fine-grained level,and refines the matching results to sub-pixel precision in a desired way.Introduce the ground-truth homography matrix to calculate the Euclidean distance feedback loss of coarse and fine-grained matching point-to-coordinates,and supervise the learning of the network.UFormer integrates feature detection and feature matching,with a simpler structure,which improves real-time performance while ensuring accuracy,and has the ability to deal with weak texture scenes to a certain extent.On the collected drone trajectory data set,compared with SIFT,the coordinate accuracy improves by 0.416 pixels,the matching time decreases to 0.106 s,and the matching success rate for weak texture scene images is higher.

Key words: Scene matching, Attention mechanism, Visual localization, Deep learning

CLC Number: 

  • TN967.2
[1]JIANG X Y,MA J Y,XIAO G B,et al.A review of multimodal image matching:Methods and applications[J].Information Fusion,2021,73:22-71.
[2]LENG C C,ZHANG H,LI B,et al.Local feature descriptor for image matching:A survey[J].IEEE Access,2018,7:6424-6434.
[3]MIAN A S,BENNAMOUN M,OWENS R,et al.Keypoint detection and local feature matching for textured 3D face recognition[J].International Journal of Computer Vision,2008,79(1):1-12.
[4]LI J,ALLINSON N M.A comprehensive review of current local features for computer vision[J].Neurocomputing,2008,71 (10/11/12):1771-1787.
[5]CHEN L,ROTTENSTEINER F,HEIPKE C,et al.Feature detection and description for image matching:from hand-crafted design to deep learning[J].Geo-spatial Information Science,2021,24(1):58-74.
[6]SARLIN P E,DETONE D,MALISIEWICZ T,et al.Superglue:Learning feature matching with graph neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4938-4947.
[7]RONNEBERGER O,FISCHER P,BROX T,et al.U-net:Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and compu-ter-assisted intervention.Springer,2015:234-241.
[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[9]SUN J,SHEN Z,WANG Y,et al.LoFTR:Detector-free local feature matching with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8922-8931.
[10]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[11]LOWE G.Sift-the scale invariant feature transform[J].International Joural,2004,2(2):91-110.
[12]BAY H,TUYTELAARS T,GOOL L,et al.Surf:Speeded up robust features[C]//European Conference on Computer Vision.Springer,2006:404-417.
[13]SILPA-ANAN C,HARTLEY R.Optimised KD-trees for fast image descriptor matching[C]//2008 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2008:1-8.
[14]CALONDER M,LEPETIT V,STRECHA C,et al.Binary robust independent elementary features[C]//Proceedings of the European Conference on Computer Vision:778-792.
[15]VISWANATHAN D G.Features from accelerated segment test (fast)[C]//Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services.London,UK,2009:6-8.
[16]RUBLEE E,RABAUD V,KONOLIGE K,et al.ORB:An efficient alternative to SIFT or SURF[C]//International Confe-rence on Computer Vision.IEEE:2011:2564-2571.
[17]DETONE D,MALISIEWICZ T,RABINOVICH A,et al.Superpoint:Self-supervised interest point detection and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:224-236.
[18]BARROSO-LAGUNA A,RIBA E,PONSA D,et al.Key.net:Keypoint detection by handcrafted and learned cnn filters[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5836-5844.
[19]SHEN X L,WANG C,LI X,et al.Rf-net:An end-to-end image matching network based on receptive field[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:8132-8140.
[20]LEE J,KIM B,CHO M S,et al.Self-Supervised Equivariant Learning for Oriented Keypoint Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4847-4857.
[21]DUSMANU M,ROCCO,PAJDLA T,et al.D2-net:A trainable cnn for joint description and detection of local features[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:8092-8101.
[22]ONO Y,TRULLS E,FUA P,et al.LF-Net:Learning local features from images[C]//Advances in Neural Information Processing Systems.2018.
[23]REVAUD J,WEINIAEPFEL P,DE S C,et al.R2D2:repeatable and reliable detector and descriptor[J]arXiv:1906.06195,2019.
[24]YIN J,LIU Q,MENG F,et al.STCDesc:Learning deep local descriptor using similar triangle constraint[J].Knowledge-based systems,2022(19):248.
[25]TIAN Y,FAN B,WU F.L2-Net:Deep Learning of Discriminative Patch Descriptor in Euclidean Space[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017.
[26]MISHCHUK A,MISHKIN D,RADENOVIC F,et al.Working hard to know your neighbor’s margins:Local descriptor learning loss[C]//Advances in Neural Information Processing Systems.2017.
[27]DANG Z,DENG C,YANG X,et al.Nearest neighbor matching for deep clustering[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2021:13693-13702.
[28]FISCHLER M A,BOLLES R C.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM 1981,24(6):381-395.
[29]WANG Q,ZHANG J,YANG K,et al.MatchFormer:Interleaving Attention in Transformers for Feature Matching[J].ar-Xiv:2203.09645,2022.
[1] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[2] LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[3] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[4] ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
[5] YI Liu, GENG Xinyu, BAI Jing. Hierarchical Multi-label Text Classification Algorithm Based on Parallel Convolutional Network Information Fusion [J]. Computer Science, 2023, 50(9): 278-286.
[6] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
[7] LI Ke, YANG Ling, ZHAO Yanbo, CHEN Yonglong, LUO Shouxi. EGCN-CeDML:A Distributed Machine Learning Framework for Vehicle Driving Behavior Prediction [J]. Computer Science, 2023, 50(9): 318-330.
[8] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[9] SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78.
[10] WANG Xu, WU Yanxia, ZHANG Xue, HONG Ruize, LI Guangsheng. Survey of Rotating Object Detection Research in Computer Vision [J]. Computer Science, 2023, 50(8): 79-92.
[11] ZHOU Ziyi, XIONG Hailing. Image Captioning Optimization Strategy Based on Deep Learning [J]. Computer Science, 2023, 50(8): 99-110.
[12] TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[13] ZHANG Xiao, DONG Hongbin. Lightweight Multi-view Stereo Integrating Coarse Cost Volume and Bilateral Grid [J]. Computer Science, 2023, 50(8): 125-132.
[14] WANG Jiahao, ZHONG Xin, LI Wenxiong, ZHAO Dexin. Human Activity Recognition with Meta-learning and Attention [J]. Computer Science, 2023, 50(8): 193-201.
[15] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!