计算机科学 ›› 2026, Vol. 53 ›› Issue (1): 206-215.doi: 10.11896/jsjkx.250200090

• 计算机图形学&多媒体 • 上一篇    下一篇

基于文本-图像多模态融合的变电所布局图纸图符检测方法

范家斌, 王宝会, 陈继轩   

  1. 北京航空航天大学软件学院 北京 100191
  • 收稿日期:2025-02-24 修回日期:2025-04-27 发布日期:2026-01-08
  • 通讯作者: 王宝会(wangbh@buaa.edu.cn)
  • 作者简介:(jiabin_fan@163.com)

Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion

FAN Jiabin, WANG Baohui, CHEN Jixuan   

  1. School of Software, Beihang University, Beijing 100191, China
  • Received:2025-02-24 Revised:2025-04-27 Online:2026-01-08
  • About author:FAN Jiabin,born in 1991,postgra-duate.His main research interests include computer vision and artificial intelligence.
    WANG Baohui,born in 1973,senior engineer,master supervisor.His main research interests include software architecture,big data,artificial intelligence,etc.

摘要: 为了解决人工识别变电所布局图纸过程中存在操作不便、效率低、识别数据管理难等问题,提出了一种基于形态学的大尺寸图纸分割方法和基于文本-图像多模态融合的图纸图符检测方法,结合图符检测的后处理方法,形成了一种可推广到其他领域的大尺寸布局图纸图符检测思路。其中,文本-图像多模态融合图纸图符检测模型基于开集目标检测模型YOLO-World进行改进,通过引入卷积注意力协同模块(Convolutional Attention Collaboration Module,CTCM)、小目标图符特征增强模块(Small Object Feature Enhancement Module,SOFEM)和上下文引导融合模块(Context-aware Joint Feature Fusion Module,CJFFM),使模型在图符识别精度上有了明显提升。使用提出的方法,实现了对真实高铁牵引变电所布局图纸数据集的图符检测。改进模型相比原始模型,在保证模型复杂度未明显增大的情况下,图符识别平均精度达到了97.5%,mAP@50:95和mAP@90分别提高了1.1%和3.0%。

关键词: 文本-图像多模态, 特征融合, 注意力机制, 小目标检测, 图纸图符检测

Abstract: To address the issues of inconvenient operation,low efficiency,and difficulty in managing recognition data during the manual identification of substation layout drawings,this paper proposes a morphology-based large-size drawing segmentation method and a text-image multimodal fusion drawing symbol detection method.Combined with post-processing methods for symbol detection,this forms a detectable and adaptable approach to large-size layout drawing symbol detection that can be generalized to other fields.The text-image multimodal fusion drawing symbol detection model is improved upon the open-set object detection model YOLO-World,by introducing the CTCM,SOFEM,and CJFFM.These enhancements significantly improve the model's performance in symbol recognition.Using the proposed methods,the detection of symbols in actual high-speed railway traction substation general layout drawings dataset is achieved.Compared to the original model,the proposed improved model,while maintaining a similar level of complexity,reaches an average precision of 97.5% for symbol recognition,with mAP@50:95 and mAP@90 increasing by 1.1% and 3.0%,respectively.

Key words: Text-image multimodal, Feature fusion, Attention mechanism, Small object detection, Diagram symbol recognition

中图分类号: 

  • TP301
[1]KASHEVNIK A,ALI A H,MAYATIN A.AI-Based Methodfor Frame Detection in Engineering Drawings[C]//2023 International Russian Smart Industry Conference.2023:225-229.
[2]NURMINEN J K,RAINIO K,NUMMINEN J P,et al.ObjectDetection in Design Diagrams with Machine Learning[C]//Advances in Intelligent Systems and Computing.2019:27-36.
[3]YANG C,WANG J,YANG L,et al.Intelligent digitization ofsubstation one-line diagrams based on computer vision[J].IEEE Transactions on Power Delivery,2023,38(6):3912-3923.
[4]ZHAO Y,DENG X,LAI H.A Deep Learning-Based Method to Detect Components from Scanned Structural Drawings for Reconstructing 3D Models[J].Applied Sciences,2020,10(6):2066.
[5]JOY J,MOUNSEF J.Automation of Material Takeoff usingComputer Vision[C]//IEEE International Conference on Industry 4.0,Artificial Intelligence,and Communications Technology.IEEE,2021.
[6]RAHMAN S M,BAYER J,DENGEL A.Graph-Based ObjectDetection Enhancement for Symbolic Engineering Drawings[C]//International Conference on Document Analysis and Re-cognition.Cham:Springer,2021:74-90.
[7]BHANBHRO H,HOOI YK,HASSAN Z,et al.Modern deep learning approaches for symbol detection in complex engineering drawings[C]//2022 International Conference on Digital Transformation and Intelligence(ICDI).2022:121-126.
[8]SARKAR S,PANDEY P,KAR S.Automatic Detection andClassification of Symbols in Engineering Drawings[J].arXiv:2204.13277,2022.
[9]HAAR C,KIM H,KOBERG L.AI-Based Engineering and Production Drawing Information Extraction[C]//International Conference on Flexible Automation and Intelligent Manufactu-ring.Cham:Springer,2023:374-382.
[10]RUMALSHAN O R,WEERASINGHE P,SHAHEER M,et al.Transfer Learning Approach for Railway Technical Map(RTM) Component Identification[C]//Proceedings of Seventh International Congress on Information and Communication Technology.Singapor:Springer,2022:479-488.
[11]JIANGZ Y,SHI W J,MA J,et al.Research on Electrical Symbol Recognition Algorithm Based on Deep Neural Networks[J].Proceedings of the CSU-EPSA,2022(2):34.
[12]LI H,WANG S,GENGY J,et al.Research on Wiring Diagram Detection and Verification Based on Deep Learning and Graph Matching[J].Journal of Beijing University of Aeronautics and Astronautics,2021,47(3):539-548.
[13]XU J,ZHANG H,XU H,et al.A Method for Power Grid Symbol Recognition Based on Faster RCNN[J].Computer and Modernization,2021(12):5.
[14]VAN ETTEN A.You Only Look Twice:Rapid Multi-Scale Object Detection In Satellite Imagery[J].arXiv:1805.09512,2018.
[15]AKYON F C,ALTINUC S O,TEMIZEL A.Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection[J].arXiv:2202.06934,2022.
[16]CHENG X,CHUX R,DENG X H,et al.A Substation Wiring Diagram Symbol Detection Method Based on a Two-Layer Block Detection Network[J].Journal of Southeast University,2022,52(6):1137-1144.
[17]CHENG T,SONG L,GE Y,et al.YOLO-World:Real-TimeOpen-Vocabulary Object Detection[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2024.
[18]RADFORDA,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning.PMLR,2021:8748-8763.
[19]JOCHERG,CHAURASIA A,QIU J.Ultralyt ics yolov8[EB/OL].https://github.com/ultralytics/ultralytics.
[20]VASANTHI P,MOHAN L.Multi-Head-Self-Attention basedYOLOv5X-transformer for multi-scale object detection[J].Multimedia Tools and Applications,2024,83:36491-36517.
[21]SUNKARA R,LUO T.No morestrided convolutions or pooling:A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Sprin-ger,2022:443-459.
[22]CUI Y,REN W,KNOLL A.Omni-Kernel Network for ImageRestoration[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:1426-1434.
[23]HU J,SHEN L,SUN G,et al.Squeeze-and-Excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2018:7132-7141.
[24]ZHAO Y,LYU W,XU S,et al.Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!