Computer Science ›› 2026, Vol. 53 ›› Issue (1): 206-215.doi: 10.11896/jsjkx.250200090

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion

FAN Jiabin, WANG Baohui, CHEN Jixuan   

  1. School of Software, Beihang University, Beijing 100191, China
  • Received:2025-02-24 Revised:2025-04-27 Published:2026-01-08
  • About author:FAN Jiabin,born in 1991,postgra-duate.His main research interests include computer vision and artificial intelligence.
    WANG Baohui,born in 1973,senior engineer,master supervisor.His main research interests include software architecture,big data,artificial intelligence,etc.

Abstract: To address the issues of inconvenient operation,low efficiency,and difficulty in managing recognition data during the manual identification of substation layout drawings,this paper proposes a morphology-based large-size drawing segmentation method and a text-image multimodal fusion drawing symbol detection method.Combined with post-processing methods for symbol detection,this forms a detectable and adaptable approach to large-size layout drawing symbol detection that can be generalized to other fields.The text-image multimodal fusion drawing symbol detection model is improved upon the open-set object detection model YOLO-World,by introducing the CTCM,SOFEM,and CJFFM.These enhancements significantly improve the model's performance in symbol recognition.Using the proposed methods,the detection of symbols in actual high-speed railway traction substation general layout drawings dataset is achieved.Compared to the original model,the proposed improved model,while maintaining a similar level of complexity,reaches an average precision of 97.5% for symbol recognition,with mAP@50:95 and mAP@90 increasing by 1.1% and 3.0%,respectively.

Key words: Text-image multimodal, Feature fusion, Attention mechanism, Small object detection, Diagram symbol recognition

CLC Number: 

  • TP301
[1]KASHEVNIK A,ALI A H,MAYATIN A.AI-Based Methodfor Frame Detection in Engineering Drawings[C]//2023 International Russian Smart Industry Conference.2023:225-229.
[2]NURMINEN J K,RAINIO K,NUMMINEN J P,et al.ObjectDetection in Design Diagrams with Machine Learning[C]//Advances in Intelligent Systems and Computing.2019:27-36.
[3]YANG C,WANG J,YANG L,et al.Intelligent digitization ofsubstation one-line diagrams based on computer vision[J].IEEE Transactions on Power Delivery,2023,38(6):3912-3923.
[4]ZHAO Y,DENG X,LAI H.A Deep Learning-Based Method to Detect Components from Scanned Structural Drawings for Reconstructing 3D Models[J].Applied Sciences,2020,10(6):2066.
[5]JOY J,MOUNSEF J.Automation of Material Takeoff usingComputer Vision[C]//IEEE International Conference on Industry 4.0,Artificial Intelligence,and Communications Technology.IEEE,2021.
[6]RAHMAN S M,BAYER J,DENGEL A.Graph-Based ObjectDetection Enhancement for Symbolic Engineering Drawings[C]//International Conference on Document Analysis and Re-cognition.Cham:Springer,2021:74-90.
[7]BHANBHRO H,HOOI YK,HASSAN Z,et al.Modern deep learning approaches for symbol detection in complex engineering drawings[C]//2022 International Conference on Digital Transformation and Intelligence(ICDI).2022:121-126.
[8]SARKAR S,PANDEY P,KAR S.Automatic Detection andClassification of Symbols in Engineering Drawings[J].arXiv:2204.13277,2022.
[9]HAAR C,KIM H,KOBERG L.AI-Based Engineering and Production Drawing Information Extraction[C]//International Conference on Flexible Automation and Intelligent Manufactu-ring.Cham:Springer,2023:374-382.
[10]RUMALSHAN O R,WEERASINGHE P,SHAHEER M,et al.Transfer Learning Approach for Railway Technical Map(RTM) Component Identification[C]//Proceedings of Seventh International Congress on Information and Communication Technology.Singapor:Springer,2022:479-488.
[11]JIANGZ Y,SHI W J,MA J,et al.Research on Electrical Symbol Recognition Algorithm Based on Deep Neural Networks[J].Proceedings of the CSU-EPSA,2022(2):34.
[12]LI H,WANG S,GENGY J,et al.Research on Wiring Diagram Detection and Verification Based on Deep Learning and Graph Matching[J].Journal of Beijing University of Aeronautics and Astronautics,2021,47(3):539-548.
[13]XU J,ZHANG H,XU H,et al.A Method for Power Grid Symbol Recognition Based on Faster RCNN[J].Computer and Modernization,2021(12):5.
[14]VAN ETTEN A.You Only Look Twice:Rapid Multi-Scale Object Detection In Satellite Imagery[J].arXiv:1805.09512,2018.
[15]AKYON F C,ALTINUC S O,TEMIZEL A.Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection[J].arXiv:2202.06934,2022.
[16]CHENG X,CHUX R,DENG X H,et al.A Substation Wiring Diagram Symbol Detection Method Based on a Two-Layer Block Detection Network[J].Journal of Southeast University,2022,52(6):1137-1144.
[17]CHENG T,SONG L,GE Y,et al.YOLO-World:Real-TimeOpen-Vocabulary Object Detection[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2024.
[18]RADFORDA,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning.PMLR,2021:8748-8763.
[19]JOCHERG,CHAURASIA A,QIU J.Ultralyt ics yolov8[EB/OL].https://github.com/ultralytics/ultralytics.
[20]VASANTHI P,MOHAN L.Multi-Head-Self-Attention basedYOLOv5X-transformer for multi-scale object detection[J].Multimedia Tools and Applications,2024,83:36491-36517.
[21]SUNKARA R,LUO T.No morestrided convolutions or pooling:A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Sprin-ger,2022:443-459.
[22]CUI Y,REN W,KNOLL A.Omni-Kernel Network for ImageRestoration[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:1426-1434.
[23]HU J,SHEN L,SUN G,et al.Squeeze-and-Excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2018:7132-7141.
[24]ZHAO Y,LYU W,XU S,et al.Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
[1] DUAN Pengting, WEN Chao, WANG Baoping, WANG Zhenni. Collaborative Semantics Fusion for Multi-agent Behavior Decision-making [J]. Computer Science, 2026, 53(1): 252-261.
[2] CHEN Qian, CHENG Kaixuan, GUO Xin, ZHANG Xiaoxia, WANG Suge, LI Yanhong. Bidirectional Prompt-Tuning for Event Argument Extraction with Topic and Entity Embeddings [J]. Computer Science, 2026, 53(1): 278-284.
[3] ZHANG Xiaomin, ZHAO Junzhi, HE Hongjie. Screen-shooting Resilient Watermarking Method for Document Image Based on Attention Mechanism [J]. Computer Science, 2026, 53(1): 413-422.
[4] LYU Jinggang, GAO Shuo, LI Yuzhi, ZHOU Jin. Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation [J]. Computer Science, 2026, 53(1): 195-205.
[5] WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240.
[6] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[7] PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281.
[8] GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319.
[9] GUO Husheng, ZHANG Xufei, SUN Yujie, WANG Wenjian. Continuously Evolution Streaming Graph Neural Network [J]. Computer Science, 2025, 52(8): 118-126.
[10] SHEN Tao, ZHANG Xiuzai, XU Dai. Improved RT-DETR Algorithm for Small Object Detection in Remote Sensing Images [J]. Computer Science, 2025, 52(8): 214-221.
[11] LIU Jian, YAO Renyuan, GAO Nan, LIANG Ronghua, CHEN Peng. VSRI:Visual Semantic Relational Interactor for Image Caption [J]. Computer Science, 2025, 52(8): 222-231.
[12] LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[13] LUO Xuyang, TAN Zhiyi. Knowledge-aware Graph Refinement Network for Recommendation [J]. Computer Science, 2025, 52(7): 103-109.
[14] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[15] ZHUANG Jianjun, WAN Li. SCF U2-Net:Lightweight U2-Net Improved Method for Breast Ultrasound Lesion SegmentationCombined with Fuzzy Logic [J]. Computer Science, 2025, 52(7): 161-169.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!