Computer Science ›› 2024, Vol. 51 ›› Issue (6): 256-263.doi: 10.11896/jsjkx.230500230

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Scene Text Detection Algorithm Based on Feature Enhancement

GAO Nan, ZHANG Lei, LIANG Ronghua, CHEN Peng, FU Zheng   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2023-05-31 Revised:2023-10-25 Online:2024-06-15 Published:2024-06-05
  • About author:GAO Nan,Ph.D,born in 1983,is a member of CCF(No.83932F).Her main research interests include cross modal generation and retrieval,natural language processing,medical image processing,etc.
  • Supported by:
    National Natural Science Foundation of China(61702456,62036009,U1909203) and National Key Research and Development Program of China(2020YFB1707700).

Abstract: To address the problem of missed and false detection of image text in natural scenes due to complex backgrounds and variable scales,this paper proposes a text detection algorithm for scenes based on feature enhancement.In the feature pyramid fusion stage,a dual-domain attention feature fusion module(D2AAFM)is proposed,which can better fuse feature map information of different semantics and scales,thus improving the characterization ability of text information.At the same time,considering the problem of semantic information loss in the process of up-sampling and fusion of deeper feature maps of the network,the multi-scale spatial perception module(MSPM)is proposed to enhance the semantic features of text in higher-level feature maps by expanding the perceptual field to obtain contextual information of a larger perceptual field,thus effectively reduce the text of missed and false detection.In order to evaluate the effectiveness of the proposed algorithm,it is tested on the publicly available datasets ICDAR2015,CTW1500 and MSRA-TD500,and its overall index F-value reaches 82.8%,83.4% and 85.3%,respectively.The experimental results show that the algorithm has good detection capability on different datasets.

Key words: Deep learning, Scene text detection, Attention mechanisms, Multi-scale feature fusion, Dilated convolution

CLC Number: 

  • TP391
[1]QIN Y,ZHANG Z.Summary of Scene Text Detection and Re-cognition[C]//Proceedings of IEEE Conference on Industrial Electronics and Applications.Kristiansand:IEEE,2020:85-89.
[2]CHEN M M,IBRAYI M,HAMDULL A.Research of SceneText Detection Algorithms[C]//Proceedings of International Conference on Intelligent Robotics and Control Engineering.Tianjin:IEEE,2022:108-112.
[3]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[4]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-time Object Detection[C]//Proceedings of International Conference on Computer Vision and Pattern Re-cognition.Las Vegas:IEEE,2016:779-788.
[5]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Proceedings of International Confe-rence on Computer Vision and Pattern Recognition.Amsterdam:Springer,2016:21-37.
[6]TIAN Z,HUANG W,HE T,et al.Detecting Text in NaturalImage with Connectionist Text Proposal Network[C]//Proceedings of European Conference on Computer Vision.Amsterdam:Springer,2016:56-72.
[7]JIANG Y,ZHU X,WANG X,et al.R2CNN:Rotational Region CNN for Orientation Robust Scene Text Detection[C]//Proceedings of International Conference on Pattern Recognition.Vienna:IEEE,2018:3610-3615.
[8]SHI B,BAI X,BELONGIE S.Detecting Oriented Text in Natural Images by Linking Segments[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.HI:IEEE,2017:3482-3490.
[9]ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.HI:IEEE,2017:2642-2651.
[10]ZHANG C,LIANG B,HUANG Z,et al.Look More ThanOnce:An Accurate Detector for Text of Arbitrary Shapes[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:10544-10553.
[11]LIAO M,SHI B,BAI X,et al.Textboxes:A Fast Text Detector with A Single Deep Neural Network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.San Francisco:AAAI,2017:4161-4167.
[12]LIAO M,SHI B,BAI X.Textboxes++:A Sin-gle-shot Oriented Scene Text Detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690.
[13]LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[14]ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented text detection with fully convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4159-4167.
[15]LONG S,RUAN J,ZHANG W,et al.Textsnake:A FlexibleRepresentation for Detecting Text of Arbi-trary Shapes[C]//Proceedings of European Conference on Computer Vision.Munich:AAAI,2018:20-36.
[16]HE T,HUANG W,QIAO Y,et al.Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network[J].arXiv:1603.09423,2016.
[17]XIE E,ZANG Y,SHAO S,et al.Scene Text Detection with Supervised Pyramid Context Network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2019:9038-9045.
[18]DENG D,LIU H,LI X,et al.Pixellink:Detecting Scene TextVia Instance Segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2018:20-36.
[19]WANG Q,GAO J,ZHANG M,et al.SPCNet:Scale PositionCorrelation Network for End-to-End Visual Tracking[C]//Proceedings of International Conference on Pattern Recognition.Beijing:IEEE,2018:1803-1808.
[20]WANG W,XIE E,LI X,et al.Shape Robust Text Detectionwith Progressive Scale Expansion Net-work[C]//Proceedings of International Conference on Computer Vision and Pattern Re-cognition.Long Beach:IEEE,2019:9328-9337.
[21]WANG W,XIE E,SONG X,et al.Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network[C]//Proceedings of International Conference on Computer Vision.Seoul:IEEE,2019:8440-8449.
[22]LIAO M,WAN Z,YAO C,et al.Real-time Scene Text Detection with Differentiable Binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2020:11474-11481.
[23]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[24]KARATZAS D,GOMEZ B L,NICOLAOU A,et al.Icdar 2015 Competition on Robust Reading[C]//Proceedings of International Conference on Document Analysis and Recognition.Tunis:IEEE,2015:1156-1160.
[25]LIU Y,JIN L,ZHANG S.Detecting Curve Text in the Wild:New Dataset and New Solution[J].arXiv:1712.02170,2017.
[26]YAO C,BAI X,LIU W Y,et al.Detecting Texts of ArbitraryOrientations in Natural Images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Providence:IEEE,2012:1083-1090.
[1] LIU Jiasen, HUANG Jun. Center Point Target Detection Algorithm Based on Improved Swin Transformer [J]. Computer Science, 2024, 51(6): 264-271.
[2] JIANG Rui, YANG Kaihui, WANG Xiaoming, LI Dapeng, XU Youyun. Attentional Interaction-based Deep Learning Model for Chinese Question Answering [J]. Computer Science, 2024, 51(6): 325-330.
[3] LIU Chunling, QI Xuyan, TANG Yonghe, SUN Xuekai, LI Qinghao, ZHANG Yu. Summary of Token-based Source Code Clone Detection Techniques [J]. Computer Science, 2024, 51(6): 12-22.
[4] KONG Jialin, ZHANG Qi, WANG Caiyong. Review of Heterogeneous Iris Recognition [J]. Computer Science, 2024, 51(6): 186-197.
[5] LI Zekai, BAI Zhengyao, XIAO Xiao, ZHANG Yihan, YOU Yilin. Point Cloud Upsampling Network Incorporating Transformer and Multi-stage Learning Framework [J]. Computer Science, 2024, 51(6): 231-238.
[6] BAO Kainan, ZHANG Junbo, SONG Li, LI Tianrui. ST-WaveMLP:Spatio-Temporal Global-aware Network for Traffic Flow Prediction [J]. Computer Science, 2024, 51(5): 27-34.
[7] ZHANG Jianliang, LI Yang, ZHU Qingshan, XUE Hongling, MA Junwei, ZHANG Lixia, BI Sheng. Substation Equipment Malfunction Alarm Algorithm Based on Dual-domain Sparse Transformer [J]. Computer Science, 2024, 51(5): 62-69.
[8] HE Shiyang, WANG Zhaohui, GONG Shengrong, ZHONG Shan. Cross-modal Information Filtering-based Networks for Visual Question Answering [J]. Computer Science, 2024, 51(5): 85-91.
[9] SONG Jianfeng, ZHANG Wenying, HAN Lu, HU Guozheng, MIAO Qiguang. Multi-stage Intelligent Color Restoration Algorithm for Black-and-White Movies [J]. Computer Science, 2024, 51(5): 92-99.
[10] BAI Xuefei, SHEN Wucheng, WANG Wenjian. Salient Object Detection Based on Feature Attention Purification [J]. Computer Science, 2024, 51(5): 125-133.
[11] HE Xiaohui, ZHOU Tao, LI Panle, CHANG Jing, LI Jiamian. Study on Building Extraction from Remote Sensing Image Based on Multi-scale Attention [J]. Computer Science, 2024, 51(5): 134-142.
[12] XU Xuejie, WANG Baohui. Multi-label Patent Classification Based on Text and Historical Data [J]. Computer Science, 2024, 51(5): 172-178.
[13] LI Zichen, YI Xiuwen, CHEN Shun, ZHANG Junbo, LI Tianrui. Government Event Dispatch Approach Based on Deep Multi-view Network [J]. Computer Science, 2024, 51(5): 216-222.
[14] HONG Tijing, LIU Dengfeng, LIU Yian. Radar Active Jamming Recognition Based on Multiscale Fully Convolutional Neural Network and GRU [J]. Computer Science, 2024, 51(5): 306-312.
[15] SUN Jing, WANG Xiaoxia. Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation [J]. Computer Science, 2024, 51(5): 313-320.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!