Computer Science ›› 2020, Vol. 47 ›› Issue (8): 221-226.doi: 10.11896/jsjkx.190500017

Previous Articles     Next Articles

End-to-end Network Structure Optimization of Scene Text Recognition Based on Residual Connection

HUANG Jin-xing, PAN Xiang, ZHENG He-rong   

  1. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:ZHENG He-rong, born in 1971, Ph.D, professor, supervisor.His main research interests include pattern recognition and image processing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61871350).

Abstract: The existing text recognition methods will cause decreased recognition accuracy due to not enough network depth.The paper addresses this issue and proposes an improved end-to-end text recognition network structure.Firstly, the algorithm takes the text as a sequence, and uses the residual module to divide the text into columns for the recurrent layer.This residual structureincreases network depth, to maintain the network’s best representation of the text image.It can capture the best feature representation of text images.Meanwhile, the residual module uses the stacked layer to learn the residual mapping to improve the convergence of the network though the number of layers is obviously increased.Secondly, we use the recurrent layer to model the context of these text features, and the modeling results will be taken into the softmax layer to predict corresponding labels, which achieve the recognition of arbitrary length of texts.The recurrent layer uses the Long Short-Term Memory to learn the dependencies between texts and solve the gradient vanishing problem in long sequence training.Finally, text label transcription and decoding are performed by the optimal path method.The method finds a path to maximize its probability, and outputs the sequence corresponding to the path as the optimal sequence.The improved text recognition network structure increases network depth, improves the feature description of text images and the stability under noises.In the experimental part, this paper compares with existing typical algorithms over the multiple test datasets (ICDAR2003, ICDAR2013, SVT and IIIT5K).The experiments show that the network structure can obtain better text recognition accuracy and verify the effectiveness of the proposed network structure.

Key words: Residual connection, Scene text recognition, Stacked layer, Network depth, Optimal path

CLC Number: 

  • TP311
[1] SHI B G, YANG M K, WANG X G, et al.ASTER:An Attentional Scene Text Recognizer with Flexible Rectification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9):2035-2048.
[2] SHI B G, BAI X, YAO C.An end-to-end trainable neural net-work for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304.
[3] CHENG Z Z, BAI F, XU Y L, et al.Focusing attention:Towards accurate text recognition in natural images[C]∥Proceedings of the IEEE International Conference on Computer Vision.2017:5086-5094.
[4] LEE C Y, OSINDERO S.Recursive recurrent nets with attention modeling for ocr in the wild[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2231-2239.
[5] ALSHARIF O, PINEAU J.End-to-end text recognition withhybrid HMM maxout models[J].arXiv:1310.1811, 2013.
[6] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556, 2014.
[7] JADERBERG M, SIMONYAN K, VEDALDI A, et al.Synthetic data and artificial neural networks for natural scene text recognition[J].arXiv:1406.2227, 2014
[8] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[9] BAI X, YANG M K, SHI B G, et al.Scene text detection and recognition based on deep learning [J].Scientia Sinica Informationis, 2018, 48(5):531-544.
[10] WANG K, BABENKO B, BELONGIE S.End-to-end scene text recognition[C]∥2011 International Conference on Computer Vision.IEEE, 2011:1457-1464.
[11] DALAL N, TRIGGS B.Histograms of oriented gradients forhuman detection[C]∥International Conference on Computer Vision & Pattern Recognition (CVPR’05).IEEE Computer Society, 2005:886-893.
[12] YAO C, BAI X, SHI B, et al.Strokelets:A learned multi-scale representation for scene text recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:4042-4049.
[13] BISSACCO A, CUMMINS M, NETZER Y, et al.Photoocr:Reading text in uncontrolled conditions[C]∥Proceedings of the IEEE International Conference on Computer Vision.2013:785-792.
[14] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al.Maxout networks[J].arXiv:1302.4389, 2013.
[15] GOODFELLOW I J, BULATOV Y, IBARZ J, et al.Multi-digit number recognition from street view imagery using deep convolutional neural networks[J].arXiv:1312.6082, 2013.
[16] JADERBERG M, SIMONYAN K, VEDALDI A, et al.Deepstructured output learning for unconstrained text recognition[J].arXiv:1412.5903, 2014.
[17] LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324.
[18] SU B, LU S.Accurate scene text recognition based on recurrent neural network[C]∥Asian Conference on Computer Vision.Cham:Springer, 2014:35-48.
[19] GRAVES A, FERNNDEZ S, GOMEZ F, et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]∥Proceedings of the 23rd International Conference on Machine Learning.ACM, 2006:369-376.
[20] BAEK J, KIM G, LEE J, et al.What is wrong with scene text recognition model comparisons? dataset and model analysis[J].arXiv:1904.01906, 2019.
[21] BORISYUK F, GORDO A, SIVAKUMAR V.Rosetta:Largescale system for text detection and recognition in images[C]∥Proceedings of the 24th ACM SIGKDD International Confe-rence on Knowledge Discovery & Data Mining.ACM, 2018:71-79.
[22] LUO C, JIN L, SUN Z.Moran:A multi-object rectified attention network for scene text recognition[J].Pattern Recognition, 2019, 90:109-118.
[23] IOFFE S, SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167, 2015.
[24] GUPTA A, VEDALDI A, ZISSERMAN A.Synthetic data for text localisation in natural images[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[25] LUCAS S M, PANARETOS A, SOSA L, et al.ICDAR 2003 robust reading competitions[C]∥Seventh International Confe-rence on Document Analysis and Recognition.IEEE, 2003:682-687.
[26] KARATZAS D, SHAFAIT F, UCHIDA S, et al.ICDAR 2013robust reading competition[C]∥2013 12th International Conference on Document Analysis and Recognition.IEEE, 2013:1484-1493.
[27] MISHRA A, ALAHARI K, JAWAHAR C.Scene text recognition using higher order language priors[C]∥Proceedings of British Machine Vision Conference (BMVC).2012:1-11.
[28] KINGMA D P, BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980, 2014.
[29] WANG J, HU X.Gated recurrent convolution neural network for ocr[C]∥Advances in Neural Information Processing Systems.2017:335-344.
[30] CHENG Z Z, XU Y L, BAI F, et al.Aon:Towards arbitrarily-oriented text recognition[C]∥Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:5571-5579.HUANG Jin-xing, born in 1995, postgraduate.Her main research interests include pattern recognition and image processing.
[1] SHAN Mei-jing, QIN Long-fei, ZHANG Hui-bing. L-YOLO:Real Time Traffic Sign Detection Model for Vehicle Edge Computing [J]. Computer Science, 2021, 48(1): 89-95.
[2] LIU Jun, XU Ping-ping, WU Gui-lu, PENG Jie. PSO-ACO Fusion Algorithm Based on Optimal Path Planning in Indoor Environment [J]. Computer Science, 2018, 45(11A): 97-100.
[3] FANG Xian-wen,TAO Xiao-yan and LIU Xiang-wei. Method of Optimal Path Selection Based on Modal Petri Net Branching Effective Range [J]. Computer Science, 2014, 41(7): 91-96.
[4] . Application of the Molecular Algorithm in Public Transport Network Problem [J]. Computer Science, 2012, 39(2): 262-267.
[5] . [J]. Computer Science, 2008, 35(2): 19-22.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .