Computer Science ›› 2020, Vol. 47 ›› Issue (7): 135-140.doi: 10.11896/jsjkx.190600157

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Complex Scene Text Detection Based on Attention Mechanism

LIU Yan, WEN Jing   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:2019-06-26 Online:2020-07-15 Published:2020-07-16
  • About author:LIU Yan,born in 1990,master.Her main research interests include compu-ter vision and so on.
    WEN Jing,born in 1982,Ph.D,associate professor,master tutor,is a member of China Computer Federation.Her main research interests include computer vision,image processing and pattern re-cognition.
  • Supported by:
    This work was supported by the Young Scientists Fund of the National Natural Science Foundation of China (61703252),1331 Engineering Project of Shanxi Province and Shanxi Province Applied Basic Research Programs (201701D121053)

Abstract: Most of the traditional text detection methods are developed in the bottom-up manner,which usually start with low-level semantic character or stroke detection,followed by non-text component filtering,text line construction,and text line validation.However,the modeling,scale,typesetting and surrounding environment of the characters in the complex scene change drastically,and the task of detecting text is carried up by human under variety of visual granularities.It’s difficult for these bottom-up traditional methods to maintain the text features under different resolution,due to their dependency on the low lever features.Recently,deep learning methods have been widely used in text detection in order to extract more features under different scale.However,in the existing methods,the key feature information is not emphasized during the feature extraction process of each layer,and will be lost in the layer-to-layer feature mapping process.Therefore,the missing information will also lead to a lot of false-alarm and leak detection,which causes much more time-consuming.This paper proposes a complex scene text detection method based on the attention mechanism.The main contribution of this method is to introduce a visual attention layer in VGG16,and use the attention mechanism to enhance the significant information in the global information in the network.Experiments show that in the Ubuntu environment with GPU,this method can ensure the integrity of the text area in the detection of complex scene text pictures,reduce the fragmentation of the detection area and can achieve up to 87% recall rate and 89% precision rate.

Key words: Text detection, Deep learning, Attention mechanism

CLC Number: 

  • TP391
[1] GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[2] HUANG W,QIAO Y,TANG X.Robust scene text detectionwith convolutional neural networks induced mser trees[C]//European Conference on Computer Vision (ECCV).2014:3.
[3] TIAN S,PAN Y,HUANG C,et al.Text flow:A unified textdetection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659.
[4] YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scenetext detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1930-1937.
[5] EPSHTEIN B,OFEK E,WEXLER Y.Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:2963-2970.
[6] REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[7] HE W,ZHANG X Y,YIN F,et al.Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:745-753.
[8] LIU Y,JIN L.Deep matching prior network:Toward tightermulti-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1962-1969.
[9] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10] GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610.
[11] WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[12] HE T,HUANG W,QIAO Y,et al.Text-attentional convolutional neural network for scene text detection[J].IEEE Transactions on Image Processing,2016,25(6):2529-2541.
[13] ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision.Cham:Springer.2014:818-833.
[14] TIAN Z,HUANG W,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72.
[15] ZHOU X,YAO C,WEN H,et al.EAST:an efficient and accurate scene text detector[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2017:5551-5560.
[16] LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37.
[17] ICDAR 2013 robust reading competition challenge 2 results[OL].
[18] BAI B,YIN F,LIU C L.Scene text localization using gradient local correlation[C]//2013 12th International Conference on Document Analysis and Recognition.IEEE,2013:1380-1384.
[19] YIN X C,YIN X,HUANG K,et al.Robust text detection innatural scene images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(5):970-983.
[20] ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented Text Detection with Fully Convolutional Networks[C]//Computer Vision and Pattern Recognition.2016:4159-4167.
[21] YAO C,BAI X,SANG N,et al.Scene Text Detection via Holistic,Multi-Channel Prediction[J].arXiv:1606.09002,2016.
[22] LIU X,LIANG D,YAN S,et al.FOTS:Fast Oriented TextSpotting with a Unified Network[C]//Computer Vision and Pattern Recognition.2018:5676-5685.
[23] LI Y,YU Y,LI Z,et al.Pixel-Anchor:A Fast Oriented Scene Text Detector with Combined Networks[J].arXiv:1811.07432,2018.
[24] BAEK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//Computer Vision and Pattern Recognition.2019:9365-9374.
[1] ZHAO Jia-qi, WANG Han-zheng, ZHOU Yong, ZHANG Di, ZHOU Zi-yuan. Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement [J]. Computer Science, 2021, 48(1): 190-196.
[2] LIU Yang, JIN Zhong. Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism [J]. Computer Science, 2021, 48(1): 197-203.
[3] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[4] YU Wen-jia, DING Shi-fei. Conditional Generative Adversarial Network Based on Self-attention Mechanism [J]. Computer Science, 2021, 48(1): 241-246.
[5] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[6] WANG Run-zheng, GAO Jian, HUANG Shu-hua, TONG Xin. Malicious Code Family Detection Method Based on Knowledge Distillation [J]. Computer Science, 2021, 48(1): 280-286.
[7] DING Yu, WEI Hao, PAN Zhi-song, LIU Xin. Survey of Network Representation Learning [J]. Computer Science, 2020, 47(9): 52-59.
[8] HE Xin, XU Juan, JIN Ying-ying. Action-related Network:Towards Modeling Complete Changeable Action [J]. Computer Science, 2020, 47(9): 123-128.
[9] YE Ya-nan, CHI Jing, YU Zhi-ping, ZHAN Yu-liand ZHANG Cai-ming. Expression Animation Synthesis Based on Improved CycleGan Model and Region Segmentation [J]. Computer Science, 2020, 47(9): 142-149.
[10] DENG Liang, XU Geng-lin, LI Meng-jie, CHEN Zhang-jin. Fast Face Recognition Based on Deep Learning and Multiple Hash Similarity Weighting [J]. Computer Science, 2020, 47(9): 163-168.
[11] PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong. MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism [J]. Computer Science, 2020, 47(9): 185-189.
[12] BAO Yu-xuan, LU Tian-liang, DU Yan-hui. Overview of Deepfake Video Detection Technology [J]. Computer Science, 2020, 47(9): 283-292.
[13] ZHAO Wei, LIN Yu-ming, WANG Chao-qiang, CAI Guo-yong. Opinion Word-pairs Collaborative Extraction Based on Dependency Relation Analysis [J]. Computer Science, 2020, 47(8): 164-170.
[14] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[15] WANG Wen-dao, WANG Run-ze, WEI Xin-lei, QI Yun-liang, MA Yi-de. Automatic Recognition of ECG Based on Stacked Bidirectional LSTM [J]. Computer Science, 2020, 47(7): 118-124.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .