计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 256-263.doi: 10.11896/jsjkx.230500230

• 计算机图形学&多媒体 • 上一篇    下一篇

一种基于特征增强的场景文本检测算法

高楠, 张雷, 梁荣华, 陈朋, 付政   

  1. 浙江工业大学计算机科学与技术学院 杭州 310023
  • 收稿日期:2023-05-31 修回日期:2023-10-25 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 高楠(gaonan@zjut.edu.cn)
  • 基金资助:
    国家自然科学基金(61702456,62036009,U1909203);国家重点研发计划(2020YFB1707700)

Scene Text Detection Algorithm Based on Feature Enhancement

GAO Nan, ZHANG Lei, LIANG Ronghua, CHEN Peng, FU Zheng   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2023-05-31 Revised:2023-10-25 Online:2024-06-15 Published:2024-06-05
  • About author:GAO Nan,Ph.D,born in 1983,is a member of CCF(No.83932F).Her main research interests include cross modal generation and retrieval,natural language processing,medical image processing,etc.
  • Supported by:
    National Natural Science Foundation of China(61702456,62036009,U1909203) and National Key Research and Development Program of China(2020YFB1707700).

摘要: 针对自然场景下图像文本复杂背景、尺度多变等造成的漏检、误检问题,提出了一种基于特征增强的场景文本检测算法。在特征金字塔融合阶段,提出了双域注意力特征融合模块(Dual-domain Attention Feature Fusion Module,D2AAFM)。该模块能够更好地融合不同语义和尺度的特征图信息,从而提高文本信息的表征能力。同时,考虑到网络深层特征图在上采样融合过程中出现语义信息损失的问题,提出了多尺度空间感知模块(Multi-scale Spatial Perception Module,MSPM),通过扩大感受野来获取更大感受野的上下文信息,增强深层特征图的文本语义信息特征,从而有效地减少文本漏检、误检。为了评估所提算法的有效性,在公开数据集ICDAR2015,CTW1500以及MSRA-TD500上进行实验,所提方法综合指标F值分别达到了82.8%,83.4%和85.3%。实验结果表明,该算法在不同数据集上都具有良好的检测能力。

关键词: 深度学习, 场景文本检测, 注意力机制, 多尺度特征融合, 空洞卷积

Abstract: To address the problem of missed and false detection of image text in natural scenes due to complex backgrounds and variable scales,this paper proposes a text detection algorithm for scenes based on feature enhancement.In the feature pyramid fusion stage,a dual-domain attention feature fusion module(D2AAFM)is proposed,which can better fuse feature map information of different semantics and scales,thus improving the characterization ability of text information.At the same time,considering the problem of semantic information loss in the process of up-sampling and fusion of deeper feature maps of the network,the multi-scale spatial perception module(MSPM)is proposed to enhance the semantic features of text in higher-level feature maps by expanding the perceptual field to obtain contextual information of a larger perceptual field,thus effectively reduce the text of missed and false detection.In order to evaluate the effectiveness of the proposed algorithm,it is tested on the publicly available datasets ICDAR2015,CTW1500 and MSRA-TD500,and its overall index F-value reaches 82.8%,83.4% and 85.3%,respectively.The experimental results show that the algorithm has good detection capability on different datasets.

Key words: Deep learning, Scene text detection, Attention mechanisms, Multi-scale feature fusion, Dilated convolution

中图分类号: 

  • TP391
[1]QIN Y,ZHANG Z.Summary of Scene Text Detection and Re-cognition[C]//Proceedings of IEEE Conference on Industrial Electronics and Applications.Kristiansand:IEEE,2020:85-89.
[2]CHEN M M,IBRAYI M,HAMDULL A.Research of SceneText Detection Algorithms[C]//Proceedings of International Conference on Intelligent Robotics and Control Engineering.Tianjin:IEEE,2022:108-112.
[3]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[4]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-time Object Detection[C]//Proceedings of International Conference on Computer Vision and Pattern Re-cognition.Las Vegas:IEEE,2016:779-788.
[5]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Proceedings of International Confe-rence on Computer Vision and Pattern Recognition.Amsterdam:Springer,2016:21-37.
[6]TIAN Z,HUANG W,HE T,et al.Detecting Text in NaturalImage with Connectionist Text Proposal Network[C]//Proceedings of European Conference on Computer Vision.Amsterdam:Springer,2016:56-72.
[7]JIANG Y,ZHU X,WANG X,et al.R2CNN:Rotational Region CNN for Orientation Robust Scene Text Detection[C]//Proceedings of International Conference on Pattern Recognition.Vienna:IEEE,2018:3610-3615.
[8]SHI B,BAI X,BELONGIE S.Detecting Oriented Text in Natural Images by Linking Segments[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.HI:IEEE,2017:3482-3490.
[9]ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.HI:IEEE,2017:2642-2651.
[10]ZHANG C,LIANG B,HUANG Z,et al.Look More ThanOnce:An Accurate Detector for Text of Arbitrary Shapes[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:10544-10553.
[11]LIAO M,SHI B,BAI X,et al.Textboxes:A Fast Text Detector with A Single Deep Neural Network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.San Francisco:AAAI,2017:4161-4167.
[12]LIAO M,SHI B,BAI X.Textboxes++:A Sin-gle-shot Oriented Scene Text Detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690.
[13]LONG J,SHELHAMER E,DARRELL T.Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[14]ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented text detection with fully convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:4159-4167.
[15]LONG S,RUAN J,ZHANG W,et al.Textsnake:A FlexibleRepresentation for Detecting Text of Arbi-trary Shapes[C]//Proceedings of European Conference on Computer Vision.Munich:AAAI,2018:20-36.
[16]HE T,HUANG W,QIAO Y,et al.Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network[J].arXiv:1603.09423,2016.
[17]XIE E,ZANG Y,SHAO S,et al.Scene Text Detection with Supervised Pyramid Context Network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2019:9038-9045.
[18]DENG D,LIU H,LI X,et al.Pixellink:Detecting Scene TextVia Instance Segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2018:20-36.
[19]WANG Q,GAO J,ZHANG M,et al.SPCNet:Scale PositionCorrelation Network for End-to-End Visual Tracking[C]//Proceedings of International Conference on Pattern Recognition.Beijing:IEEE,2018:1803-1808.
[20]WANG W,XIE E,LI X,et al.Shape Robust Text Detectionwith Progressive Scale Expansion Net-work[C]//Proceedings of International Conference on Computer Vision and Pattern Re-cognition.Long Beach:IEEE,2019:9328-9337.
[21]WANG W,XIE E,SONG X,et al.Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network[C]//Proceedings of International Conference on Computer Vision.Seoul:IEEE,2019:8440-8449.
[22]LIAO M,WAN Z,YAO C,et al.Real-time Scene Text Detection with Differentiable Binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2020:11474-11481.
[23]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[24]KARATZAS D,GOMEZ B L,NICOLAOU A,et al.Icdar 2015 Competition on Robust Reading[C]//Proceedings of International Conference on Document Analysis and Recognition.Tunis:IEEE,2015:1156-1160.
[25]LIU Y,JIN L,ZHANG S.Detecting Curve Text in the Wild:New Dataset and New Solution[J].arXiv:1712.02170,2017.
[26]YAO C,BAI X,LIU W Y,et al.Detecting Texts of ArbitraryOrientations in Natural Images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Providence:IEEE,2012:1083-1090.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!