计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 179-186.doi: 10.11896/jsjkx.240500064

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多层次嵌套Transformer的船名识别网络

王腾1, 冼允廷1, 徐浩1, 谢宋褀1, 邹全义2   

  1. 1 华南理工大学计算机科学与工程学院 广州 510006
    2 华南理工大学新闻与传播学院 广州 510006
  • 收稿日期:2024-05-16 修回日期:2024-09-05 出版日期:2025-06-15 发布日期:2025-06-11
  • 通讯作者: 冼允廷(xianyt@scut.edu.cn)
  • 作者简介:(cswangteng@mail.scut.edu.cn)
  • 基金资助:
    广东省哲学社会科学规划项目(GD24YXW02);广东省高校青年创新人才类项目(2023KQNC005)

Ship License Plate Recognition Network Based on Pyramid Transformer in Transformer

WANG Teng1, XIAN Yunting1, XU Hao1, XIE Songqi1, ZOU Quanyi2   

  1. 1 School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,China
    2 School of Journalism and Communication,South China University of Technology,Guangzhou 510006,China
  • Received:2024-05-16 Revised:2024-09-05 Online:2025-06-15 Published:2025-06-11
  • About author:WANG Teng,born in 2000,postgradua-te.His main research interests include image processing and deep learning.
    XIAN Yunting,born in 1982,Ph.D,lab master.His main research interests include artificial intelligence and image processing.
  • Supported by:
    Guangdong Philosophy and Social Science Foundation Regular Project(GD24YXW02) and Youth Innovative Talent Projects of Guangdong Universities(2023KQNC005).

摘要: 船舶身份识别在水上目标监管中具有重要意义和广泛应用。船名是船舶身份识别的重要组成部分,准确识别船名可以弥补传统AIS身份识别方法的不足,提高船舶身份识别的准确率。与传统的中文文本识别相比,水上环境复杂,光照变化大,船体受腐蚀严重,船名字体不规范,导致船名图像存在清晰度低、文字残缺、字体样式不一致等问题,进而使船名识别困难且准确率低。文中设计了一种基于多层次嵌套Transformer的轻量级识别网络,以解决船名识别中存在的问题。首先,通过空间变换网络对输入图片进行处理,纠正船名倾斜的情况;然后利用嵌套Transformer有效提取图像的多粒度特征;最后对文字和部首进行不同尺度的识别。实验结果显示,相比其他文字识别方法,所提算法在船名识别中表现优异;在CSLD数据集上,准确率达到了92.68%;在SCSLD数据集上,准确率达到了94.50%;在DCSLD数据集上,准确率达到了66.34%;同时,该方法具有低参数量和高帧率的特点。

关键词: 中文文本识别, 船名识别, 深度学习, 场景文本识别, Transformer

Abstract: Ship identification is of great significance and widely used in the regulation of waterborne targets.As one of the important components of ship identification,accurate identification of ship name can make up for the shortcomings of traditional AIS identification methods and improve the accuracy of ship identification.Compared with the traditional Chinese text recognition,due to the complex water environment,large changes in light,serious corrosion of ship hulls,and non-standardized ship names,ship name images have low clarity,text mutilation,inconsistent font styles and other problems,which make ship name recognition difficult and low accuracy.In this paper,a lightweight recognition network based on Pyramid Transformer in Transformer is proposed to solve the problems in ship name recognition.Firstly,the input image is processed by a spatial transform network to correct the tilt of the ship name.Then,the Transformer in Transformer module is utilized to efficiently extract the multi-granularity features of the image.Finally,the text and radical are recognized at different scales.Experimental results show that the proposed algorithm has excellent performance in ship name recognition compared with other text recognition methods.The accuracy reaches 92.68% on CSLD dataset,94.50% on SCSLD dataset,and 66.34% on DCSLD dataset.At the same time,this method is characterized by a low number of parameters and a high frame rate.

Key words: Chinese text recognition, Ship license plate recognition, Deep learning, Scene text recognition, Transformer

中图分类号: 

  • TP183
[1]JIN L W,YIN J X,GAO X,et al.Study of Several directional feature extraction methods with local elastic meshing technology for HCCR[C]//Proceedings of the 6th International Conference for Young Computer Scientist.Hong Kong:International Academic Publishers,World Publishing Corporation,2001:232-236.
[2]SU Y M,WANG J F.A novel stroke extraction method for Chinese characters using Gabor filters[J].Pattern Recognition,2003,36(3):635-647.
[3]CHANG F.Techniques for Solving the Large-Scale Classification Problem in Chinese Handwriting Recognition[M].Berlin:Springer,2008:161-169.
[4]YU H,CHEN J,LI B,et al.Benchmarking Chinese Text Recognition:Datasets,Baselines,and an Empirical Study[J].arXiv:2112.15093,2021.
[5]SHI B,BAI X,YAO C.An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(11):2298-2304.
[6]SHI B,YANG M,WANG X,et al.ASTER:An AttentionalScene Text Recognizer with Flexible Rectification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(9):2035-2048.
[7]LU N,YU W,QI X,et al.MASTER:Multi-aspect non-localnetwork for scene text recognition[J].Pattern Recognition,2021,117:107980.
[8]FANG S,XIE H,WANG Y,et al.Read like humans:Autonomous,bidirectional and iterative language modeling for scene text recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Computer Society,2021:7098-7107.
[9]WANG W,ZHANG J,DU J,et al.DenseRAN for Offline Handwritten Chinese Character Recognition[C]//Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition(ICFHR).New York:IEEE,2018:104-109.
[10]WANG T,XIE Z,LI Z,et al.Radical aggregation network for few-shot offline handwritten Chinese character recognition[J].Pattern Recognition Letters,2019,125:821-827.
[11]DENG X,HUANG Z,MA K,et al.RRecT:Chinese Text Recognition with Radical-Enhanced Recognition Transformer[C]//Proceedings of the International Conference on Artificial Neural Networks and Machine Learning - ICANN 2023.Berlin:Springer,2023:509-521.
[12]CAO Z,LU J,CUI S,et al.Zero-shot Handwritten ChineseCharacter Recognition with hierarchical decomposition embedding[J].Pattern Recognition,2020,107:107488.
[13]CHEN J,LI B,XUE X.Zero-shot Chinese character recognition with stroke-level decomposition[J].arXiv:2106.11613,2021.
[14]LIU X,HU B,CHEN Q,et al.Stroke sequence-dependent deep convolutional neural network for online handwritten Chinese character recognition[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(11):4637-4648.
[15]YU H,WANG X,LI B,et al.Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).Paris:IEEE,2023:11909-11918.
[16]LIU B,ZHANG S,HONG Z,et al.A Horizontal Tilt Correction Method for Ship License Numbers Recognition[J].Journal of Physics:Conference Series,2018,976(1):012013.
[17]LIU D,CAO J,WANG T,et al.SLPR:A Deep Learning Based Chinese Ship License Plate Recognition Framework[J].IEEE Transactions on Intelligent Transportation Systems,2022,23(12):23831-23843.
[18]LIU B,WU S,ZHANG S,et al.Ship License Numbers Recognition Using Deep Neural Networks[J].Journal of Physics:Conference Series,2018,1060(1):012064.
[19]ZHANG W,SUN H,ZHOU J,et al.DCNN Based Real-TimeAdaptive Ship License Plate Recognition(DRASLPR)[C]//Proceedings of the IEEE International Conference on Internet of Things(iThings) and IEEE Green Computing and Communications(GreenCom) and IEEE Cyber,Physical and Social Computing(CPSCom) and IEEE Smart Data(SmartData).New York:IEEE,2018:1829-1834.
[20]ZHOU C,LIU D,WANG T,et al.M3ANet:Multi-modal and multi-attention fusion network for ship license plate recognition[J].IEEE Transactions on Multimedia,2023,26:5976-5986.
[21]WANG W,XIE E,LI X,et al.Pyramid Vision Transformer:A Versatile Backbone for Dense Prediction without Convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).New York:IEEE,2021:548-558.
[22]HAN K,XIAO A,WU E,et al.Transformer in transformer[J].Advances in neural information processing systems,2021,34:15908-15919.
[23]DOSOVITSKIY A,BEYER L,KOLESBIKOV A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[24]CHEN J,LI B,XUE X.Scene Text Telescope:Text-FocusedScene Image Super-Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).New York:IEEE,2021:12021-12030.
[25]DU Y,CHEN Z,JIA C,et al.SVTR:Scene Text Recognition with a Single Visual Model[C]//Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence.New York:IEEE,2022:884-890.
[26]CHENG X,ZHOU W,LI X,et al.VIPTR:A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition[J].arXiv:2401.10110,2024.
[27]GRAVES A,FERNANDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning.New York:Association for Computing Machinery,2006:369-376.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!