Computer Science ›› 2020, Vol. 47 ›› Issue (7): 135-140.doi: 10.11896/jsjkx.190600157

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Complex Scene Text Detection Based on Attention Mechanism

LIU Yan, WEN Jing   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:2019-06-26 Online:2020-07-15 Published:2020-07-16
  • About author:LIU Yan,born in 1990,master.Her main research interests include compu-ter vision and so on.
    WEN Jing,born in 1982,Ph.D,associate professor,master tutor,is a member of China Computer Federation.Her main research interests include computer vision,image processing and pattern re-cognition.
  • Supported by:
    This work was supported by the Young Scientists Fund of the National Natural Science Foundation of China (61703252),1331 Engineering Project of Shanxi Province and Shanxi Province Applied Basic Research Programs (201701D121053)

Abstract: Most of the traditional text detection methods are developed in the bottom-up manner,which usually start with low-level semantic character or stroke detection,followed by non-text component filtering,text line construction,and text line validation.However,the modeling,scale,typesetting and surrounding environment of the characters in the complex scene change drastically,and the task of detecting text is carried up by human under variety of visual granularities.It’s difficult for these bottom-up traditional methods to maintain the text features under different resolution,due to their dependency on the low lever features.Recently,deep learning methods have been widely used in text detection in order to extract more features under different scale.However,in the existing methods,the key feature information is not emphasized during the feature extraction process of each layer,and will be lost in the layer-to-layer feature mapping process.Therefore,the missing information will also lead to a lot of false-alarm and leak detection,which causes much more time-consuming.This paper proposes a complex scene text detection method based on the attention mechanism.The main contribution of this method is to introduce a visual attention layer in VGG16,and use the attention mechanism to enhance the significant information in the global information in the network.Experiments show that in the Ubuntu environment with GPU,this method can ensure the integrity of the text area in the detection of complex scene text pictures,reduce the fragmentation of the detection area and can achieve up to 87% recall rate and 89% precision rate.

Key words: Text detection, Deep learning, Attention mechanism

CLC Number: 

  • TP391
[1] GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[2] HUANG W,QIAO Y,TANG X.Robust scene text detectionwith convolutional neural networks induced mser trees[C]//European Conference on Computer Vision (ECCV).2014:3.
[3] TIAN S,PAN Y,HUANG C,et al.Text flow:A unified textdetection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659.
[4] YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scenetext detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1930-1937.
[5] EPSHTEIN B,OFEK E,WEXLER Y.Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:2963-2970.
[6] REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[7] HE W,ZHANG X Y,YIN F,et al.Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:745-753.
[8] LIU Y,JIN L.Deep matching prior network:Toward tightermulti-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1962-1969.
[9] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10] GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610.
[11] WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[12] HE T,HUANG W,QIAO Y,et al.Text-attentional convolutional neural network for scene text detection[J].IEEE Transactions on Image Processing,2016,25(6):2529-2541.
[13] ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision.Cham:Springer.2014:818-833.
[14] TIAN Z,HUANG W,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72.
[15] ZHOU X,YAO C,WEN H,et al.EAST:an efficient and accurate scene text detector[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2017:5551-5560.
[16] LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37.
[17] ICDAR 2013 robust reading competition challenge 2 results[OL].
[18] BAI B,YIN F,LIU C L.Scene text localization using gradient local correlation[C]//2013 12th International Conference on Document Analysis and Recognition.IEEE,2013:1380-1384.
[19] YIN X C,YIN X,HUANG K,et al.Robust text detection innatural scene images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(5):970-983.
[20] ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented Text Detection with Fully Convolutional Networks[C]//Computer Vision and Pattern Recognition.2016:4159-4167.
[21] YAO C,BAI X,SANG N,et al.Scene Text Detection via Holistic,Multi-Channel Prediction[J].arXiv:1606.09002,2016.
[22] LIU X,LIANG D,YAN S,et al.FOTS:Fast Oriented TextSpotting with a Unified Network[C]//Computer Vision and Pattern Recognition.2018:5676-5685.
[23] LI Y,YU Y,LI Z,et al.Pixel-Anchor:A Fast Oriented Scene Text Detector with Combined Networks[J].arXiv:1811.07432,2018.
[24] BAEK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//Computer Vision and Pattern Recognition.2019:9365-9374.
[1] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[2] WANG Wen-dao, WANG Run-ze, WEI Xin-lei, QI Yun-liang, MA Yi-de. Automatic Recognition of ECG Based on Stacked Bidirectional LSTM [J]. Computer Science, 2020, 47(7): 118-124.
[3] ZHANG Zhi-yang, ZHANG Feng-li, TAN Qi, WANG Rui-jin. Review of Information Cascade Prediction Methods Based on Deep Learning [J]. Computer Science, 2020, 47(7): 141-153.
[4] JIANG Wen-bin, FU Zhi, PENG Jing, ZHU Jian. 4Bit-based Gradient Compression Method for Distributed Deep Learning System [J]. Computer Science, 2020, 47(7): 220-226.
[5] CHEN Jin-yin, ZHANG Dun-Jie, LIN Xiang, XU Xiao-dong and ZHU Zi-ling. False Message Propagation Suppression Based on Influence Maximization [J]. Computer Science, 2020, 47(6A): 17-23.
[6] YU Yi-lin, TIAN Hong-tao, GAO Jian-wei and WAN Huai-yu. Relation Extraction Method Combining Encyclopedia Knowledge and Sentence Semantic Features [J]. Computer Science, 2020, 47(6A): 40-44.
[7] CHENG Zhe, BAI Qian, ZHANG Hao, WANG Shi-pu and LIANG Yu. Improving Hi-C Data Resolution with Deep Convolutional Neural Networks [J]. Computer Science, 2020, 47(6A): 70-74.
[8] HE Lei, SHAO Zhan-peng, ZHANG Jian-hua and ZHOU Xiao-long. Review of Deep Learning-based Action Recognition Algorithms [J]. Computer Science, 2020, 47(6A): 139-147.
[9] SUN Zheng and WANG Xin-yu. Application of Deep Learning in Photoacoustic Imaging [J]. Computer Science, 2020, 47(6A): 148-152.
[10] ZHANG Man, LI Jie, DING Rong-li, CHENG Hao-tian and SHEN Ji. Remote Sensing Image ObJect Detection Technology Based on Improved YOLO-V2 Algorithm [J]. Computer Science, 2020, 47(6A): 176-180.
[11] LI Lin, ZHAO Kai-yue, ZHAO Xiao-yong, WEI Shuai-qin and ZHANG Bing. Contaminated and Shielded Number Plate Recognition Based on Convolutional Neural Network [J]. Computer Science, 2020, 47(6A): 213-219.
[12] GU Wan-rong, FAN Wei-Jiang, XIE Xian-fen, ZHANG Zi-ye, MAO Yi-Jun, LIANG Zao-qing and LIN Zhen-xi. Automatic Tumor Recognition in Ultrasound Images Based on Multi-model Optimization [J]. Computer Science, 2020, 47(6A): 260-267.
[13] CHEN Jin-yin, CHENG Kai-hui and ZHENG Hai-bin. Deep Learning Based Modulation Recognition Method in Low SNR [J]. Computer Science, 2020, 47(6A): 283-288.
[14] CHEN Jin-yin, JIANG Tao and ZHENG Hai-bin. Radio Modulation Recognition Based on Signal-noise Ratio Classification [J]. Computer Science, 2020, 47(6A): 310-317.
[15] SONG Ya-fei, CHEN Yu-zhang, SHEN Jun-feng and ZENG Zhang-fan. Underwater Image Reconstruction Based on Improved Residual Network [J]. Computer Science, 2020, 47(6A): 500-504.
Full text



[1] YANG Wen-hua,XU Chang,YE Hai-bo,ZHOU Yu,HUANG Zhi-qiu. Taxonomy of Uncertainty Factors in Intelligence-oriented Cyber-physical Systems[J]. Computer Science, 2020, 47(3): 11 -18 .
[2] JI Cheng-yu,ZHU Xue-feng. Study on Optimization of Design Pattern Combination Operation[J]. Computer Science, 2020, 47(3): 19 -24 .
[3] . [J]. Computer Science, 2020, 47(5): 2 .
[4] KONG Fang, LI Qi-zhi, LI Shuai. Survey on Online Influence Maximization[J]. Computer Science, 2020, 47(5): 7 -13 .
[5] WANG Hui-yan, XU Jing-wei, XU Chang. Survey on Runtime Input Validation for Context-aware Adaptive Software[J]. Computer Science, 2020, 47(6): 1 -7 .
[6] LI Ling, LI Huang-hua, WANG Sheng-yuan. Experiment on Formal Verification Process of Parser of CompCert Compiler in Trusted Compiler Design[J]. Computer Science, 2020, 47(6): 8 -15 .
[7] ZHAO Song-hui, REN Zhi-lei, JIANG He. Multi-objective Optimization Methods for Software Upgradeability Problem[J]. Computer Science, 2020, 47(6): 16 -23 .
[8] CUI Kai, ZHAO Guo-liang, ZHOU Kuan-jiu, LI Ming-chu. Model of Embedded Software for Solving Concurrent Defects[J]. Computer Science, 2020, 47(6): 24 -31 .
[9] XU Zi-xi, MAO Xin-jun, YANG Yi, LU Yao. Modeling and Simulation of Q&A Community and Its Incentive Mechanism[J]. Computer Science, 2020, 47(6): 32 -37 .
[10] HUANG Yong-tao, YAN Hua. Scene Graph Generation Model Combining Attention Mechanism and Feature Fusion[J]. Computer Science, 2020, 47(6): 133 -137 .