计算机科学 ›› 2020, Vol. 47 ›› Issue (7): 135-140.doi: 10.11896/jsjkx.190600157

• 计算机图形学&多媒体 • 上一篇    下一篇

基于注意力机制的复杂场景文本检测

刘燕, 温静   

  1. 山西大学计算机与信息技术学院 太原030006
  • 收稿日期:2019-06-26 出版日期:2020-07-15 发布日期:2020-07-16
  • 通讯作者: 温静(wjing@sxu.edu.cn)
  • 作者简介:449258197@qq.com
  • 基金资助:
    国家自然科学基金青年科学基金(61703252);山西省1331工程项目;山西省应用基础研究计划项目(201701D121053)

Complex Scene Text Detection Based on Attention Mechanism

LIU Yan, WEN Jing   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:2019-06-26 Online:2020-07-15 Published:2020-07-16
  • About author:LIU Yan,born in 1990,master.Her main research interests include compu-ter vision and so on.
    WEN Jing,born in 1982,Ph.D,associate professor,master tutor,is a member of China Computer Federation.Her main research interests include computer vision,image processing and pattern re-cognition.
  • Supported by:
    This work was supported by the Young Scientists Fund of the National Natural Science Foundation of China (61703252),1331 Engineering Project of Shanxi Province and Shanxi Province Applied Basic Research Programs (201701D121053)

摘要: 传统的文本检测方法大多采用自下而上的流程,它们通常从低级语义字符或笔画检测开始,然后进行非文本组件过滤、文本行构建和文本行验证。复杂场景中文字的造型、尺度、排版以及周围环境的剧烈变化,导致人的视觉系统是在不同的视觉粒度下完成文本检测任务的,而这些自底向上的传统方法的性能很大程度上依赖于低级特征的检测,难以鲁棒地适应不同粒度下的文本特征。近年来,深度学习方法被应用于文本检测中来保留不同分辨率下的文本特征,但已有的方法在对网络中各层特征提取的过程中没有明确重点特征信息,在各层之间的特征映射中会有信息丢失,造成一些非文本目标被误判,使得检测过程不仅耗时,而且会产生大量误检和漏检。为此,提出一种基于注意力机制的复杂场景文本检测方法,该方法的主要贡献是在VGG16中引入了视觉注意层,在细粒度下利用注意力机制增强网络内全局信息中的显著信息。实验表明,在载有GPU的Ubuntu环境下,该方法在复杂场景文本图片的检测中能保证文本区域的完整性,减少检测区域的碎片化,同时能获得高达87%的查全率和89%的查准率。

关键词: 深度学习, 文本检测, 注意力机制

Abstract: Most of the traditional text detection methods are developed in the bottom-up manner,which usually start with low-level semantic character or stroke detection,followed by non-text component filtering,text line construction,and text line validation.However,the modeling,scale,typesetting and surrounding environment of the characters in the complex scene change drastically,and the task of detecting text is carried up by human under variety of visual granularities.It’s difficult for these bottom-up traditional methods to maintain the text features under different resolution,due to their dependency on the low lever features.Recently,deep learning methods have been widely used in text detection in order to extract more features under different scale.However,in the existing methods,the key feature information is not emphasized during the feature extraction process of each layer,and will be lost in the layer-to-layer feature mapping process.Therefore,the missing information will also lead to a lot of false-alarm and leak detection,which causes much more time-consuming.This paper proposes a complex scene text detection method based on the attention mechanism.The main contribution of this method is to introduce a visual attention layer in VGG16,and use the attention mechanism to enhance the significant information in the global information in the network.Experiments show that in the Ubuntu environment with GPU,this method can ensure the integrity of the text area in the detection of complex scene text pictures,reduce the fragmentation of the detection area and can achieve up to 87% recall rate and 89% precision rate.

Key words: Attention mechanism, Deep learning, Text detection

中图分类号: 

  • TP391
[1]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[2]HUANG W,QIAO Y,TANG X.Robust scene text detectionwith convolutional neural networks induced mser trees[C]//European Conference on Computer Vision (ECCV).2014:3.
[3]TIAN S,PAN Y,HUANG C,et al.Text flow:A unified textdetection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659.
[4]YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scenetext detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1930-1937.
[5]EPSHTEIN B,OFEK E,WEXLER Y.Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:2963-2970.
[6]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[7]HE W,ZHANG X Y,YIN F,et al.Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:745-753.
[8]LIU Y,JIN L.Deep matching prior network:Toward tightermulti-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1962-1969.
[9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10]GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610.
[11]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[12]HE T,HUANG W,QIAO Y,et al.Text-attentional convolutional neural network for scene text detection[J].IEEE Transactions on Image Processing,2016,25(6):2529-2541.
[13]ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision.Cham:Springer.2014:818-833.
[14]TIAN Z,HUANG W,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72.
[15]ZHOU X,YAO C,WEN H,et al.EAST:an efficient and accurate scene text detector[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2017:5551-5560.
[16]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37.
[17]ICDAR 2013 robust reading competition challenge 2 results[OL].http://dag.cvc.uab.es/icdar2013.
[18]BAI B,YIN F,LIU C L.Scene text localization using gradient local correlation[C]//2013 12th International Conference on Document Analysis and Recognition.IEEE,2013:1380-1384.
[19]YIN X C,YIN X,HUANG K,et al.Robust text detection innatural scene images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(5):970-983.
[20]ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented Text Detection with Fully Convolutional Networks[C]//Computer Vision and Pattern Recognition.2016:4159-4167.
[21]YAO C,BAI X,SANG N,et al.Scene Text Detection via Holistic,Multi-Channel Prediction[J].arXiv:1606.09002,2016.
[22]LIU X,LIANG D,YAN S,et al.FOTS:Fast Oriented TextSpotting with a Unified Network[C]//Computer Vision and Pattern Recognition.2018:5676-5685.
[23]LI Y,YU Y,LI Z,et al.Pixel-Anchor:A Fast Oriented Scene Text Detector with Combined Networks[J].arXiv:1811.07432,2018.
[24]BAEK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//Computer Vision and Pattern Recognition.2019:9365-9374.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[4] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[5] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[7] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[8] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[12] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[13] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[14] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[15] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!