计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230300101-8.doi: 10.11896/jsjkx.230300101
霍华骑1, 陆璐1,2
HUO Huaqi1, LU Lu1,2
摘要: 对于小样本语言无关场景的文本识别,现有的方法往往面临鲁棒性低和泛化能力差的问题。针对这一问题,一方面,在特征提取阶段,提出了基于空间域和频率域特征融合的双流网络结构,其包含一个提取空间域特征的深度残差卷积网络分支,以及提取频率域特征的一维快速傅里叶变换和浅层神经网络分支,接着使用通道注意力机制融合这两种特征。另一方面,在序列建模阶段,针对语言无关场景的特点,提出一种多尺度一维卷积模块用来代替双向长短期记忆网络。然后结合现有的TPS矫正模块和CTC解码器搭建完整模型。训练过程中采用了迁移学习的方法,先在大型英文数据集上进行预训练,后在目标数据集上进行微调。在文中整理的两个小样本语言无关数据集上的实验结果表明,所提模型在准确率上优于现有的模型,验证了其在该场景下的具有较高的鲁棒性和泛化能力;此外,在语言相关场景的5个基准数据集上的相关实验(不用微调)表明,使用文中所述特征提取模块的方法优于对比的基线方法,证明了所提出的双流特征融合网络的有效性和通用性。
中图分类号:
[1]CHEN X,JIN L,ZHU Y,et al.Text recognition in the wild:A survey[J].ACM Computing Surveys(CSUR),2021,54(2):1-35. [2]YAO C,BAI X,LIU W.A unified framework for multioriented text detection and recognition[J].IEEE Transactions on Image Processing,2014,23(11):4737-4749. [3]JADERBERG M,SIMONYAN K,VEDALDI A,et al.Synthetic data and artificial neural networks for natural scene text recognition[J].arXiv:1406.2227,2014. [4]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2315-2324. [5]HAN K,WANG Y,CHEN H,et al.A survey on vision transformer[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(1):87-110. [6]SHI B,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(11):2298-2304. [7]SHI B,WANG X,LYU P,et al.Robust scene text recognition with automatic rectification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:4168-4176. [8]YUE X,KUANG Z,LIN C,et al.Robustscanner:Dynamicallyenhancing positional clues for robust text recognition[C]//European Conference on Computer Vision.Springer,2020:135-151. [9]BAEK J,KIM G,LEE J,et al.What is wrong with scene text recognition model comparisons? dataset and model analysis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2019:4715-4723. [10]HU W,CAI X,HOU J,et al.Gtc:Guided training of ctc towards efficient and accurate scene text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2020,34(7):11005-11012. [11]SHI B,YANG M,WANG X,et al.Aster:An attentional scene text recognizer with flexible rectification [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(9):2035-2048. [12]LI H,WANG P,SHEN C,et al.Show,attend and read:A simple and strong baseline for irregular text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2019,33(1):8610-8617. [13]ATIENZA R.Vision transformer for fast and efficient scenetext recognition[C]//International Conference on Document Analysis and Recognition.Springer,2021:319-334. [14]QIAO Z,ZHOU Y,YANG D,et al.Seed:Semantics enhancedencoder-decoder framework for scene text recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:13528-13537. [15]BAUTISTA D,ATIENZA R.Scene Text Recognition with Permuted Autoregressive Sequence Models[C]//European Confe-rence on Computer Vision.Springer,2022:178-196. [16]DU Y,CHEN Z,JIA C,et al.Svtr:Scene text recognition with a single visual model[J].arXiv:2205.00159,2022. [17]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778. [18]WANG H,WU X,HUANG Z,et al.High-frequency component helps explain the generalization of convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2020:8684-8694. [19]LI Y,BIAN S,WANG C,et al.Detection of Deepfakes Based on Dual-stream Network[J].Computer Science,2022,49(S2):558-566. [20]MAO X,LIU Y,SHEN W,et al.Deep residual fourier transformation for single image deblurring[J].arXiv:2111.11745,2021. [21]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:7132-7141. [22]KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013robust reading competition[C]//2013 12th International Conference on Document Analysis and Recognition.IEEE,2013:1484-1493. [23]MISHRA A,ALAHARI K,JAWAHAR C.Scene text recognition using higher order language priors[C]//BMVC-British Machine Vision Conference.BMVA,2012:1-11. [24]WANG K,BABENKO B,BELONGIE S.End-to-end scene text recognition[C]//2011 International Conference on Computer Vision.IEEE,2011:1457-1464. [25]KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on robust reading[C]//2015 13th International Conference on Document Analysis and Recognition(ICDAR).IEEE,2015:1156-1160. [26]RISNUMAWAN A,SHIVAKUMARA P,CHAN C,et al.A robust arbitrary text detection system for natural scene images[J].Expert Systems with Applications,2014,41(18):8027-8048. [27]HE M,LIU Y,YANG Z,et al.ICPR2018 contest on robustreading for multi-type web images[C]//2018 24th International Conference on Pattern Recognition(ICPR).Elsevier,2018:7-12. [28]FANG S,XIE H,WANG Y,et al.Read like humans:Autonomous,bidirectional and iterative language modeling for scene text recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2021:7098-7107. [29]XIAO Z,NIE Z,SONG C,et al.An extended attention mechanism for scene text recognition[J].Expert Systems with Applications,2022,203:117377. |
|