Computer Science ›› 2020, Vol. 47 ›› Issue (7): 135-140.doi: 10.11896/jsjkx.190600157

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Complex Scene Text Detection Based on Attention Mechanism

LIU Yan, WEN Jing   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:2019-06-26 Online:2020-07-15 Published:2020-07-16
  • About author:LIU Yan,born in 1990,master.Her main research interests include compu-ter vision and so on.
    WEN Jing,born in 1982,Ph.D,associate professor,master tutor,is a member of China Computer Federation.Her main research interests include computer vision,image processing and pattern re-cognition.
  • Supported by:
    This work was supported by the Young Scientists Fund of the National Natural Science Foundation of China (61703252),1331 Engineering Project of Shanxi Province and Shanxi Province Applied Basic Research Programs (201701D121053)

Abstract: Most of the traditional text detection methods are developed in the bottom-up manner,which usually start with low-level semantic character or stroke detection,followed by non-text component filtering,text line construction,and text line validation.However,the modeling,scale,typesetting and surrounding environment of the characters in the complex scene change drastically,and the task of detecting text is carried up by human under variety of visual granularities.It’s difficult for these bottom-up traditional methods to maintain the text features under different resolution,due to their dependency on the low lever features.Recently,deep learning methods have been widely used in text detection in order to extract more features under different scale.However,in the existing methods,the key feature information is not emphasized during the feature extraction process of each layer,and will be lost in the layer-to-layer feature mapping process.Therefore,the missing information will also lead to a lot of false-alarm and leak detection,which causes much more time-consuming.This paper proposes a complex scene text detection method based on the attention mechanism.The main contribution of this method is to introduce a visual attention layer in VGG16,and use the attention mechanism to enhance the significant information in the global information in the network.Experiments show that in the Ubuntu environment with GPU,this method can ensure the integrity of the text area in the detection of complex scene text pictures,reduce the fragmentation of the detection area and can achieve up to 87% recall rate and 89% precision rate.

Key words: Attention mechanism, Deep learning, Text detection

CLC Number: 

  • TP391
[1]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[2]HUANG W,QIAO Y,TANG X.Robust scene text detectionwith convolutional neural networks induced mser trees[C]//European Conference on Computer Vision (ECCV).2014:3.
[3]TIAN S,PAN Y,HUANG C,et al.Text flow:A unified textdetection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659.
[4]YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scenetext detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1930-1937.
[5]EPSHTEIN B,OFEK E,WEXLER Y.Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:2963-2970.
[6]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[7]HE W,ZHANG X Y,YIN F,et al.Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:745-753.
[8]LIU Y,JIN L.Deep matching prior network:Toward tightermulti-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1962-1969.
[9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10]GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610.
[11]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[12]HE T,HUANG W,QIAO Y,et al.Text-attentional convolutional neural network for scene text detection[J].IEEE Transactions on Image Processing,2016,25(6):2529-2541.
[13]ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision.Cham:Springer.2014:818-833.
[14]TIAN Z,HUANG W,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72.
[15]ZHOU X,YAO C,WEN H,et al.EAST:an efficient and accurate scene text detector[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2017:5551-5560.
[16]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37.
[17]ICDAR 2013 robust reading competition challenge 2 results[OL].
[18]BAI B,YIN F,LIU C L.Scene text localization using gradient local correlation[C]//2013 12th International Conference on Document Analysis and Recognition.IEEE,2013:1380-1384.
[19]YIN X C,YIN X,HUANG K,et al.Robust text detection innatural scene images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(5):970-983.
[20]ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented Text Detection with Fully Convolutional Networks[C]//Computer Vision and Pattern Recognition.2016:4159-4167.
[21]YAO C,BAI X,SANG N,et al.Scene Text Detection via Holistic,Multi-Channel Prediction[J].arXiv:1606.09002,2016.
[22]LIU X,LIANG D,YAN S,et al.FOTS:Fast Oriented TextSpotting with a Unified Network[C]//Computer Vision and Pattern Recognition.2018:5676-5685.
[23]LI Y,YU Y,LI Z,et al.Pixel-Anchor:A Fast Oriented Scene Text Detector with Combined Networks[J].arXiv:1811.07432,2018.
[24]BAEK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//Computer Vision and Pattern Recognition.2019:9365-9374.
[1] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[3] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[5] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[6] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[7] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[8] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[9] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[10] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[12] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[13] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[14] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[15] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
Full text



No Suggested Reading articles found!