Computer Science ›› 2026, Vol. 53 ›› Issue (1): 216-223.doi: 10.11896/jsjkx.250300045

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Visual Floorplan Localization Based on BEV Perception

CHEN Jiwei, CHEN Zebin, TAN Guang   

  1. School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
  • Received:2025-03-11 Revised:2025-05-09 Published:2026-01-08
  • About author:CHEN Jiwei,born in 2000,postgra-duate.His main research interests include visual localization and deep lear-ning.
    TAN Guang,born in 1978,Ph.D,professor,is a member of CCF(No.19464M).His main research interests include mobile computing,distributed computing and networking.
  • Supported by:
    National Natural Science Foundation of China(62372488).

Abstract: Visual floorplan localization task achieves pose estimation by matching visual observation with scene floorplan representation.In practical applications,how to effectively integrate geometric and semantic correlations between observation and floorplan in matching is particularly important for improving localization accuracy.However,existing methods have two main li-mitations.Firstly,they fail to fully utilize the semantic information within the camera’s field of view.Secondly,they lack a joint matching mechanism for geometric and semantic clues.To address these issues,this study proposes a visual floorplan localization framework based on BEV perception,which includes three core components.Firstly,the BEV semantic mapping module constructs the BEV semantic representation of local scenes through multimodal image projection transformation,achieving structured representation of observation data.Secondly,the expected observation generation module generates an expected observation database within the floorplan space,and achieves rapid generation of observation data through differentiable rendering method.Finally,the multi-level matching and localizing module proposes a geometric-semantic joint matching mechanism,which integrates the geometric layout and semantic category information from BEV observations through a hierarchical matching strategy to achieve accurate matching with the floorplan.The experimental results show that the framework achieves an improvement in localization recall from 0.32% and 4.82% to 3.12% and 58.77% on the publicly available dataset Structured3D and the self built simulation environment dataset IndoorEnv,respectively,which is significantly better than the existing baseline methods Laser and F3Loc.This validates the effectiveness and robustness of the proposed method in complex indoor scenes.

Key words: BEV perception, Floorplan localization, Visual localization, Geometric-semantic joint matching

CLC Number: 

  • TP391.4
[1]SARLIN P E,CADENA C,SIEGWART R,et al.From coarse to fine:Robust hierarchical localization at large scale[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12716-12725.
[2]PANEK V,KUKELOVA Z,SATTLER T.Meshloc:Mesh-basedvisual localization[C]//European Conference on Compu-ter Vision.Cham:Springer,2022:589-609.
[3]ZHOU Q,AGOSTINHO S,OŠEP A,et al.Is geometry enough for matching in visual localization?[C]//European Conference on Computer Vision.Cham:Springer,2022:407-425.
[4]LIU J,NIE Q,LIU Y,et al.Nerf-loc:Visual localization with conditional neural radiance field[C]//2023 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2023:9385-9392.
[5]LIU L,LI H,DAI Y.Efficient global 2d-3d matching for camera localization in a large-scale 3d map[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2372-2381.
[6]SATTLER T,LEIBE B,KOBBELT L.Fast image-based localization using direct 2d-to-3d matching[C]//2011 International Conference on Computer Vision.IEEE,2011:667-674.
[7]SATTLER T,LEIBE B,KOBBELT L.Efficient & effective prioritized matching for large-scale image-based localization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(9):1744-1756.
[8]ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5297-5307.
[9]BALNTAS V,LI S,PRISACARIU V.Relocnet:Continuous metriclearning relocalisation using neural nets[C]//Procee-dings of the European Conference on Computer Vision(ECCV).2018:751-767.
[10]SCHINDLER G,BROWN M,SZELISKI R.City-scale locationrecognition[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2007:1-7.
[11]BONIARDI F,VALADA A,MOHAN R,et al.Robot localization in floor plans using a room layout edge extraction network[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2019:5291-5297.
[12]MIN Z,KHOSRAVAN N,BESSINGER Z,et al.Laser:Latentspace rendering for 2d visual localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11122-11131.
[13]BONIARDI F,CASELITZ T,KÜMMERLE R,et al.RobustLiDAR-based localization in architectural floor plans[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2017:3318-3324.
[14]BONIARDI F,CASELITZ T,KÜMMERLE R,et al.A posegraph-based localization system for long-term navigation in CAD floor plans[J].Robotics and Autonomous Systems,2019,112:84-97.
[15]LI Z,ANG M H,RUS D.Online localization with imprecisefloor space maps using stochastic gradient descent[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2020:8571-8578.
[16]MENDEZ O,HADFIELD S,PUGEAULT N,et al.Sedar:Rea-ding floorplans like a human-using deep learning to enable human-inspired localisation[J].International Journal of Computer Vision,2020,128(5):1286-1310.
[17]WANG X,MARCOTTE R J,OLSON E.GLFP:Global localization from a floor plan[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2019:1627-1632.
[18]CHEN C,WANG R,VOGEL C,et al.F3Loc:fusion and filtering for floorplan localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:18029-18038.
[19]CHU H,KIM D K,CHEN T.You are here:Mimicking the human thinking process in reading floor-plans[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2210-2218.
[20]HOWARD-JENKINS H,RUIZ-SARMIENTO J R,PRISACARIU V A.Lalaloc:Latent layout localisation in dynamic,unvisi-ted environments[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10107-10116.
[21]CRUZ S,HUTCHCROFT W,LI Y,et al.Zillow indoor dataset:Annotated floor plans with 360deg panoramas and 3d room layouts[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:2133-2143.
[22]HOWARD-JENKINS H,PRISACARIU V A.Lalaloc++:Global floor plan comprehension for layout localisation in unvi-sited environments[C]//European Conference on Computer Vision.Cham:Springer,2022:693-709.
[23]SARLIN P E,DUSMANU M,SCHÖNBERGER J L,et al.Lamar:Benchmarking localization and mapping for augmented reality[C]//European Conference on Computer Vision.Cham:Springer,2022:686-704.
[24]SARLIN P E,DETONE D,YANG T Y,et al.Orienternet:Vi-sual localization in 2d public maps with neural matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:21632-21642.
[25]YU T,XIONG S W.Stereo Visual Localization and Mapping for Mobile Robot in Agricultural Environment[J].Computer Science,2023,50(12):185-191.
[26]PANEK V,KUKELOVA Z,SATTLER T.Visual localizationusing imperfect 3d models from the internet[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:13175-13186.
[27]AGARWAL S,FURUKAWA Y,SNAVELY N,et al.Building rome in a day[J].Communications of the ACM,2011,54(10):105-112.
[28]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al.Nerf:Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM,2021,65(1):99-106.
[29]CHEN L,CHEN W,WANG R,et al.Leveraging neural radiance fields for uncertainty-aware visual localization[C]//2024 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2024:6298-6305.
[30]BESL P J,MCKAY N D.Method for registration of 3-D shapes[C]//Sensor fusion IV:Control Paradigms and Data Structures.Spie,1992:586-606.
[31]BRACHMANN E,KRULL A,NOWOZIN S,et al.Dsac-diffe-rentiable ransac for camera localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6684-6692.
[32]SHOTTON J,GLOCKER B,ZACH C,et al.Scene coordinateregression forests for camera relocalization in RGB-D images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2013:2930-2937.
[33]VALENTIN J,NIESSNER M,SHOTTON J,et al.Exploitinguncertainty in regression forests for accurate camera relocalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4400-4408.
[34]NGUYEN S T,FONTAN A,MILFORD M,et al.Focustune:Tuning visual localization through focus-guided sampling[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2024:3606-3615.
[35]WANG S,LASKAR Z,MELEKHOV I,et al.Hscnet++:Hierarchical scene coordinateclassification and regression for visual localization with transformer[J].International Journal of Computer Vision,2024,132(7):2530-2550.
[36]REVAUD J,CABON Y,BRÉGIER R,et al.Sacreg:Scene-ag-nostic coordinate regression for visual localization[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:688-698.
[37]LIU T R,YANG H K,LIU J M,et al.Reprojection Errors as Prompts for Efficient Scene Coordinate Regression[C]//European Conference on Computer Vision.Cham:Springer,2024:286-302.
[38]KENDALL A,GRIMES M,CIPOLLA R.Posenet:A convolu-tional network for real-time 6-dof camera relocalization[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2938-2946.
[39]WALCH F,HAZIRBAS C,LEAL-TAIXE L,et al.Image-based localization using lstms for structured feature correlation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:627-637.
[40]WU J,MA L,HU X.Delving deeper into convolutional neural networks for camera relocalization[C]//2017 IEEE Internatio-nal Conference on Robotics and Automation(ICRA).IEEE,2017:5644-5651.
[41]CHEN S,CAVALLARI T,PRISACARIU V A,et al.Map-relative pose regression for visual re-localization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:20665-20674.
[42]SONG X,LI H,LIANG L,et al.TransBoNet:Learning camera localization with transformer bottleneck and attention[J].Pattern Recognition,2024,146:109975.
[43]DING M,WANG Z,SUN J,et al.CamNet:Coarse-to-fine re-trieval for camera re-localization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2871-2880.
[44]LIU X D,YU P.Cross-view Geo-visual Localization[J].Computer Science,2023,50(S2):407-413.
[45]LYU M,GUO X,ZHANG K,et al.A visual indoor localization method based on efficient image retrieval[J].Journal of Computer and Communications,2024,12(2):47-66.
[46]ZHANG B J,LIU G H,LI Z,et al.Image retrieval using compact deep semantic correlation descriptors[J].Information Processing & Management,2024,61(3):103608.
[47]ITO S,ENDRES F,KUDERER M,et al.W-rgb-d:floor-plan-based indoor global localization using a depth camera and wifi[C]//2014 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2014:417-422.
[48]HOFFER E,AILON N.Deep metric learning using triplet network[C]//Similarity-based Pattern Recognition:Third International Workshop(SIMBAD 2015).Springer,2015:84-92.
[49]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012.
[50]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[51]ZHENG J,ZHANG J,LI J,et al.Structured3d:A large photo-realistic dataset for structured 3d modeling[C]//Computer Vision-ECCV 2020:16th European Conference.Springer,2020:519-535.
[1] YU Tao, XIONG Shengwu. Stereo Visual Localization and Mapping for Mobile Robot in Agricultural Environments [J]. Computer Science, 2023, 50(12): 185-191.
[2] XIN Rui, ZHANG Xiaoli, PENG Xiafu, CHEN Jinwen. UFormer:An End-to-End Feature Point Scene Matching Algorithm Based on Transformer and U-Net [J]. Computer Science, 2023, 50(11A): 230300045-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!