计算机科学 ›› 2022, Vol. 49 ›› Issue (2): 248-255.doi: 10.11896/jsjkx.201100072

• 人工智能 • 上一篇    下一篇

基于增强特征金字塔网络的场景文本检测算法

邵海琳1, 季怡1, 刘纯平1, 徐云龙2   

  1. 1 苏州大学计算机科学与技术学院 江苏 苏州 215006
    2 苏州大学应用技术学院 江苏 苏州215300
  • 收稿日期:2020-11-09 修回日期:2021-05-02 出版日期:2022-02-15 发布日期:2022-02-23
  • 通讯作者: 刘纯平(cpliu@suda.edu.cn)
  • 作者简介:20184227001@stu.suda.edu.cn
  • 基金资助:
    国家自然科学基金(61972059,61773272,61602332);江苏省高校自然科学基金重点项目(19KJA230001);吉林大学符号计算与知识工程教育部重点实验室项目(93K172016K08);江苏高校优势学科建设工程资助项目

Scene Text Detection Algorithm Based on Enhanced Feature Pyramid Network

SHAO Hai-lin1, JI Yi1, LIU Chun-ping1, XU Yun-long2   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2 Applied Technology College of Soochow University,Suzhou,Jiangsu 215300,China
  • Received:2020-11-09 Revised:2021-05-02 Online:2022-02-15 Published:2022-02-23
  • About author:SHAO Hai-lin,born in 1995,postgra-duate,is a member of China Computer Federation.Her main research interests include scene text detection and so on.
    LIU Chun-ping,born in 1971,Ph.D,professor,Ph.D supervisor.Her main research interests include computer vision,image analysis and recognition,in particular in domains of visual saliency detection,objection and scene understanding.
  • Supported by:
    National Natural Science Foundation of China(61972059,61773272,61602332),Natural Science Foundation of Jiangsu Higher Education Institutions of China(19KJA230001),Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University(93K172016K08) and Priority Academic Program Development of Jiangsu Higher Education Institutions.

摘要: 场景文本检测有助于机器理解图像内容,在智能交通、场景理解和智能导航等领域应用广泛。现有的场景文本检测算法未充分利用高层语义信息和空间信息,限制了模型对复杂背景像素的分类能力和对不同尺度的文本实例的检测和定位能力。为解决上述问题,提出了一种基于增强特征金字塔网络的场景文本检测算法。该算法包括比率不变特征增强(Ratio Invariant Feature Enhanced,RIFE)模块和重建空间分辨率(Rebuild Spatial Resolution,RSR)模块。RIFE模块作为残差分支,增强了网络的高层语义信息传递,提高了分类能力,降低了误报率和漏捡率。RSR模块重建多层特征分辨率,利用丰富的空间信息改进边界位置。实验结果表明,所提算法提升了在多方向文本数据集ICDAR2015、弯曲文本数据集Totaltext以及长文本数据集MSRA-TD500上的检测能力。

关键词: 边界位置, 场景文本检测, 空间信息, 特征金字塔网络, 语义信息

Abstract: Scene text detection helps machines understand image content,and is widely used in the fields such as intelligent transportation,scene understanding,and intelligent navigation.Existing scene text detection algorithms do not make full use of high-level semantic information and spatial information,which limits the model's ability to classify complex background pixels and the ability to detect and locate text instances of different scales.In order to solve the above problems,a scene text detection algorithm based on enhanced feature pyramid network is proposed.The algorithm includes a RIFE (ratio invariant feature enhanced) mo-dule and a RSR (rebuild spatial resolution) module.As the residual branch,the RIFE module enhances the high-level semantic information transmission of the network,improves the classification ability,and reduces the false positive rate and the false negative rate.The RSR module reconstructs multi-layer feature resolution and uses rich spatial information to improve the boundary location.Experimental results show that the proposed algorithm improves the detection capabilities on the multi-directional text dataset ICDAR2015,the curved text dataset Totaltext,and the long text dataset MSRA-TD500.

Key words: Boundary location, Feature pyramid network, Scene text detection, Semantic information, Spatial information

中图分类号: 

  • TP391
[1]RAISI Z,NAIEL M A,FIEGUTH P,et al.Text Detection and Recognition in the Wild:A Review[J].arXiv:2006.04305,2020.
[2]LIAO M,SHI B,BAI X,et al.Textboxes:A fast text detectorwith a single deep neural network[J].arXiv:1611.06779,2016.
[3]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37.
[4]LIAO M,SHI B,BAI X.Textboxes++:A single-shot oriented scene text detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690.
[5]WANG W,XIE E,SONG X,et al.Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]//Proceedings of the IEEE International Conference on Computer Vision.2019:8440-8449.
[6]RICHARDSON E,AZAR Y,AVIOZ O,et al.It's All AboutThe Scale-Efficient Text Detection Using Adaptive Scaling[C]//The IEEE Winter Conference on Applications of Compu-ter Vision.2020:1844-1853.
[7]LIAO M,WAN Z,YAO C,et al.Real-Time Scene Text Detection with Differentiable Binarization[C]//AAAI.2020:11474-11481.
[8]DAI P,ZHANG H,CAO X.Deep multi-scale context aware feature aggregation for curved scene text detection[J].IEEE Transactions on Multimedia,2019,22(8):1969-1984.
[9]CHEN M M,XU J H.Scene text detection model based on high resolution convolutional neural networks[J].Computer Applications and Software,2020,37(10):138-144.
[10]LIN T Y,DOLLÑR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[11]KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on robust reading[C]//2015 13th International Conference on Document Analysis and Recognition (ICDAR).IEEE,2015:1156-1160.
[12]CH'NG C K,CHAN C S.Total-text:A comprehensive dataset for scene text detection and recognition[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).IEEE,2017:935-942.
[13]YAO C,BAI X,LIU W,et al.Detecting texts of arbitrary orientations in natural images[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:1083-1090.
[14]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[15]LONG S,RUAN J,ZHANG W,et al.Textsnake:A flexible rep-resentation for detecting text of arbitrary shapes[C]//Procee-dings of the European Conference on Computer Vision (ECCV).2018:20-36.
[16]ZHANG C,LIANG B,HUANG Z,et al.Look more than once:An accurate detector for text of arbitrary shapes[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:10552-10561.
[17]WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:9336-9345.
[18]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:9365-9374.
[19]ZHANG Z,ZHANG X,PENG C,et al.Exfuse:Enhancing feature fusion for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:269-284.
[20]XIE E,ZANG Y,SHAO S,et al.Scene text detection with supervised pyramid context network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33:9038-9045.
[21]GUO C,FAN B,ZHANG Q,et al.Augfpn:Improving multi-scale feature learning for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12595-12604.
[22]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[23]NAYEF N,YIN F,BIZID I,et al.Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).IEEE,2017:1454-1459.
[24]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[25]TIAN Z,HUANG W,HE T,et al.Detecting text in naturalimage with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72.
[26]ZHOU X,YAO C,WEN H,et al.East:an efficient and accurate scene text detector[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2017:5551-5560.
[27]MA J,SHAO W,YE H,et al.Arbitrary-oriented scene text de-tection via rotation proposals[J].IEEE Transactions on Multimedia,2018,20(11):3111-3122.
[28]DENG D,LIU H,LI X,et al.Pixellink:Detecting scene text via instance segmentation[J].arXiv:1801.01315,2018.
[29]LIAO M,ZHU Z,SHI B,et al.Rotation-sensitive regression for oriented scene text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5909-5918.
[30]LYU P,YAO C,WU W,et al.Multi-oriented scene text detection via corner localization and region segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7553-7563.
[31]LIU Y,WEN J.Complex scene text detection based on attention mechanism[J].Computer Science,2020,47(7):135-140.
[32]CAI Y,WANG W,REN H,et al.SPN:short path network for scene text detection[J].Neural Computing and Applications,2019,32(1):1-13.
[33]HE W,ZHANG X Y,YIN F,et al.Realtime multi-scale scenetext detection with scale-based region proposal network[J].Pattern Recognition,2020,98:107026.
[34]QIN X,JIANG J,YUAN C A,et al.Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution[J].IEEE Access,2020,8:122685-122694.
[35]ZHANG L,LIU Y,XIAO H,et al.Efficient Scene Text Detection with Textual Attention Tower[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2020:4272-4276.
[36]SHI B,BAI X,BELONGIE S.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2550-2558.
[37]WANG X,JIANG Y,LUO Z,et al.Arbitrary shape scene text detection with adaptive text region representation[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6449-6458.
[38]XU Y,WANG Y,ZHOU W,et al.Textfield:Learning a deep direction field for irregular scene text detection[J].IEEE Transactions on Image Processing,2019,28(11):5566-5579.
[1] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[2] 郭亮, 杨兴耀, 于炯, 韩晨, 黄仲浩.
基于注意力机制和门控网络相结合的混合推荐系统
Hybrid Recommender System Based on Attention Mechanisms and Gating Network
计算机科学, 2022, 49(6): 158-164. https://doi.org/10.11896/jsjkx.210500013
[3] 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松.
基于交互注意力图卷积网络的方面情感分类
Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification
计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180
[4] 吴兰, 王涵, 李斌全.
基于自监督任务最优选择的无监督域自适应方法
Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks
计算机科学, 2021, 48(6A): 357-363. https://doi.org/10.11896/jsjkx.201000030
[5] 蒋宗礼, 李苗苗, 张津丽.
基于融合元路径图卷积的异质网络表示学习
Graph Convolution of Fusion Meta-path Based Heterogeneous Network Representation Learning
计算机科学, 2020, 47(7): 231-235. https://doi.org/10.11896/jsjkx.190600085
[6] 杨力, 李欣宇, 石怀峰, 潘成胜.
空间信息网络任务智能识别方法
Task Intelligent Identification Method for Spatial Information Network
计算机科学, 2020, 47(4): 262-269. https://doi.org/10.11896/jsjkx.190300111
[7] 郭崇岭, 赵野.
区块链技术在空间信息智能感知领域的应用综述
Research on Application of Blockchain Technology in Field of Spatial Information Intelligent Perception
计算机科学, 2020, 47(11A): 354-358. https://doi.org/10.11896/jsjkx.200400044
[8] 霍丹, 张生杰, 万路军.
基于上下文的情感词向量混合模型
Context-based Emotional Word Vector Hybrid Model
计算机科学, 2020, 47(11A): 28-34. https://doi.org/10.11896/jsjkx.191100114
[9] 李煌, 王晓莉, 项欣光.
基于文本三区域分割的场景文本检测方法
Scene Text Detection Based on Triple Segmentation
计算机科学, 2020, 47(11): 142-147. https://doi.org/10.11896/jsjkx.200800157
[10] 杨柳, 王闯, 王俊毅.
一种空间信息网络体系架构的设计
System Design of Space Information Network Architecture
计算机科学, 2019, 46(6A): 309-311.
[11] 卢海川, 符海东, 刘宇.
基于CAN的地理语义数据存储与检索机制
Geo-semantic Data Storage and Retrieval Mechanism Based on CAN
计算机科学, 2019, 46(2): 171-177. https://doi.org/10.11896/j.issn.1002-137X.2019.02.027
[12] 张天柱, 邹承明.
使用模糊聚类的胶囊网络在图像分类上的研究
Study on Image Classification of Capsule Network Using Fuzzy Clustering
计算机科学, 2019, 46(12): 279-285. https://doi.org/10.11896/jsjkx.190200315
[13] 刘俊峰,李飞龙,杨杰.
基于LEO的骨干接入空间信息网络与用频策略研究
Researcn on Space Information Network Architecture Based on LEO Satellites for
Backbone Access and Frequency Resolution Strategy
计算机科学, 2018, 45(6A): 337-341.
[14] 任守纲, 万升, 顾兴健, 王浩云, 袁培森, 徐焕良.
基于多尺度空谱鉴别特征的高光谱图像分类
Hyperspectral Image Classification Based on Multi-scale Discriminative Spatial-spectral Features
计算机科学, 2018, 45(12): 243-250. https://doi.org/10.11896/j.issn.1002-137X.2018.12.040
[15] 廖勇,陈鸿宇,沈轩帆.
一种空间信息网络中改进的SCPS-SP
Modified SCPS-SP for Space Information Network
计算机科学, 2017, 44(6): 155-160. https://doi.org/10.11896/j.issn.1002-137X.2017.06.026
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!