计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 221-226.doi: 10.11896/jsjkx.190500017
黄金星, 潘翔, 郑河荣
HUANG Jin-xing, PAN Xiang, ZHENG He-rong
摘要: 针对已有文本识别网络由于深度不够而识别准确率较低的问题, 文中提出一种改进的端到端文本识别网络结构。首先, 将文本作为序列, 采用残差模块将文本按列切分成特征向量输入循环层。这种残差结构增加了卷积网络的深度, 使网络保持对文本图像的最佳表征能力, 实现对文本信息的捕捉。另一方面, 残差模块采用堆叠层来学习残差映射, 在层数加深的情况下提高了网络的收敛性。然后, 采用循环层对这些文本特征序列进行上下文建模, 并把建模结果输入Softmax层以获得序列对应标签的预测, 实现了对任意长度文本的识别。循环层使用长短时记忆网络学习文本之间的依赖关系, 解决长序列训练过程中的“梯度消失”问题。最后, 通过最优路径方法进行文本标签转录。该方法找到一条路径使其概率最大, 并输出这条路径对应的序列为最优序列。改进的文本识别网络结构增加了深度, 提高了文本图像的特征描述能力和在噪声下的稳定性。在多个测试数据集(ICDAR2003, ICDAR2013, SVT和IIIT5K)上将所提算法与已有典型算法进行实验对比分析, 结果表明该网络结构能够得到更高的场景文本识别准确率, 验证了其有效性。
中图分类号:
[1]SHI B G, YANG M K, WANG X G, et al.ASTER:An Attentional Scene Text Recognizer with Flexible Rectification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9):2035-2048. [2]SHI B G, BAI X, YAO C.An end-to-end trainable neural net-work for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304. [3]CHENG Z Z, BAI F, XU Y L, et al.Focusing attention:Towards accurate text recognition in natural images[C]∥Proceedings of the IEEE International Conference on Computer Vision.2017:5086-5094. [4]LEE C Y, OSINDERO S.Recursive recurrent nets with attention modeling for ocr in the wild[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2231-2239. [5]ALSHARIF O, PINEAU J.End-to-end text recognition withhybrid HMM maxout models[J].arXiv:1310.1811, 2013. [6]SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556, 2014. [7]JADERBERG M, SIMONYAN K, VEDALDI A, et al.Synthetic data and artificial neural networks for natural scene text recognition[J].arXiv:1406.2227, 2014 [8]HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [9]BAI X, YANG M K, SHI B G, et al.Scene text detection and recognition based on deep learning [J].Scientia Sinica Informationis, 2018, 48(5):531-544. [10]WANG K, BABENKO B, BELONGIE S.End-to-end scene text recognition[C]∥2011 International Conference on Computer Vision.IEEE, 2011:1457-1464. [11]DALAL N, TRIGGS B.Histograms of oriented gradients forhuman detection[C]∥International Conference on Computer Vision & Pattern Recognition (CVPR’05).IEEE Computer Society, 2005:886-893. [12]YAO C, BAI X, SHI B, et al.Strokelets:A learned multi-scale representation for scene text recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:4042-4049. [13]BISSACCO A, CUMMINS M, NETZER Y, et al.Photoocr:Reading text in uncontrolled conditions[C]∥Proceedings of the IEEE International Conference on Computer Vision.2013:785-792. [14]GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al.Maxout networks[J].arXiv:1302.4389, 2013. [15]GOODFELLOW I J, BULATOV Y, IBARZ J, et al.Multi-digit number recognition from street view imagery using deep convolutional neural networks[J].arXiv:1312.6082, 2013. [16]JADERBERG M, SIMONYAN K, VEDALDI A, et al.Deepstructured output learning for unconstrained text recognition[J].arXiv:1412.5903, 2014. [17]LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324. [18]SU B, LU S.Accurate scene text recognition based on recurrent neural network[C]∥Asian Conference on Computer Vision.Cham:Springer, 2014:35-48. [19]GRAVES A, FERNNDEZ S, GOMEZ F, et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]∥Proceedings of the 23rd International Conference on Machine Learning.ACM, 2006:369-376. [20]BAEK J, KIM G, LEE J, et al.What is wrong with scene text recognition model comparisons? dataset and model analysis[J].arXiv:1904.01906, 2019. [21]BORISYUK F, GORDO A, SIVAKUMAR V.Rosetta:Largescale system for text detection and recognition in images[C]∥Proceedings of the 24th ACM SIGKDD International Confe-rence on Knowledge Discovery & Data Mining.ACM, 2018:71-79. [22]LUO C, JIN L, SUN Z.Moran:A multi-object rectified attention network for scene text recognition[J].Pattern Recognition, 2019, 90:109-118. [23]IOFFE S, SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167, 2015. [24]GUPTA A, VEDALDI A, ZISSERMAN A.Synthetic data for text localisation in natural images[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324. [25]LUCAS S M, PANARETOS A, SOSA L, et al.ICDAR 2003 robust reading competitions[C]∥Seventh International Confe-rence on Document Analysis and Recognition.IEEE, 2003:682-687. [26]KARATZAS D, SHAFAIT F, UCHIDA S, et al.ICDAR 2013robust reading competition[C]∥2013 12th International Conference on Document Analysis and Recognition.IEEE, 2013:1484-1493. [27]MISHRA A, ALAHARI K, JAWAHAR C.Scene text recognition using higher order language priors[C]∥Proceedings of British Machine Vision Conference (BMVC).2012:1-11. [28]KINGMA D P, BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980, 2014. [29]WANG J, HU X.Gated recurrent convolution neural network for ocr[C]∥Advances in Neural Information Processing Systems.2017:335-344. [30]CHENG Z Z, XU Y L, BAI F, et al.Aon:Towards arbitrarily-oriented text recognition[C]∥Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:5571-5579. HUANG Jin-xing, born in 1995, postgraduate.Her main research interests include pattern recognition and image processing. |
[1] | 单美静, 秦龙飞, 张会兵. L-YOLO:适用于车载边缘计算的实时交通标识检测模型 L-YOLO:Real Time Traffic Sign Detection Model for Vehicle Edge Computing 计算机科学, 2021, 48(1): 89-95. https://doi.org/10.11896/jsjkx.200800034 |
[2] | 刘俊, 徐平平, 武贵路, 彭杰. 室内环境下基于最优路径规划的PSO-ACO融合算法 PSO-ACO Fusion Algorithm Based on Optimal Path Planning in Indoor Environment 计算机科学, 2018, 45(11A): 97-100. |
[3] | 张燕. 基于混沌优化的最优运输路径问题研究 Research on Optimal Transportation Route Based on Chaos Optimization 计算机科学, 2017, 44(Z6): 133-135. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.030 |
[4] | 方贤文,陶小燕,刘祥伟. 基于模态Petri网的行为有效区间寻找最优路径的方法 Method of Optimal Path Selection Based on Modal Petri Net Branching Effective Range 计算机科学, 2014, 41(7): 91-96. https://doi.org/10.11896/j.issn.1002-137X.2014.07.018 |
[5] | 张倩,王振哗,董亚非. 分子算法在公交网络问题中的应用 Application of the Molecular Algorithm in Public Transport Network Problem 计算机科学, 2012, 39(2): 262-267. |
[6] | . PGA:一种基于最优路径的Ad Hoc网络地理路由算法 计算机科学, 2008, 35(2): 19-22. |
[7] | . 层次化移动管理中的最优路径选择问题 计算机科学, 2006, 33(8): 42-45. |
|