计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211100038-6.doi: 10.11896/jsjkx.211100038

• 图像处理&多媒体技术 • 上一篇    下一篇

多字体印刷体维-哈-柯文关键词图像识别

沙尔旦尔·帕尔哈提, 阿布都热合曼·卡的尔, 阿力木江·亚森   

  1. 新疆财经大学信息管理学院 乌鲁木齐 830012
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 阿力木江·亚森(81805794@qq.com)
  • 作者简介:(sardar312@163.com)
  • 基金资助:
    国家自然科学基金(61662073);2020年新疆维吾尔自治区天池博士计划项目;新疆财经大学校级科研基金项目(2022XGC022,2022XGC049)

Multi-font Printed Uyghur-Kazakh-Kirghiz Keyword Image Recognition

SARDAR Parhat, ABDURAHMAN Kadir, ALIMJAN Yasin   

  1. School of Information Management,Xinjiang University of Finance and Economics,Urumqi 830012,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:SARDAR Parhat,born in 1984,Ph.D.His main research interest includes text and image information retrieval.
    ALIMJAN Yasin,born in 1985,Ph.D.His main research interests include programming language and formal system.
  • Supported by:
    National Natural Science Foundation of China(61662073),2020 Xinjiang Uyghur Autonomeus Ragion Tianchi Doctor Plan Project and Xinjiang University of Finance and Economics School Level Scientific Research Foundation Project(2022XGC022,2022XGC049).

摘要: 针对印刷体维吾尔文字识别中字体单一、识别数据规模小、识别领域不区分以及哈萨克和柯尔克孜文字识别研究缺乏等问题,提出了基于卷积神经网络(CNN)的多字体印刷体维吾尔、哈萨克和柯尔克孜(以下简称维-哈-柯)文关键词识别方法。首先,针对维-哈-柯文关键词图像语料库缺乏的问题,基于图像合成技术构建包括32种字体的维-哈-柯文关键词图像数据集。然后,使用数据扩充技术对数据集的图像进行不同程度的加噪、旋转和失真操作,来进一步体现数据集的自然场景特征。最后,使用多层CNN网络在该数据集上训练图像识别模型,均得到了96.5%以上的识别准确率,并在包括3种常用字体的实际印刷体图像识别任务中得到了96%左右的准确率,该方法减少了预处理过程,并胜过了以往机器学习框架下的其他识别方法。实验结果表明,在CNN网络框架下基于合成图像和数据扩充技术的识别方法能够较好地实现多字体印刷体维-哈-柯文图像识别任务。

关键词: 维-哈-柯语, OCR, 图像合成, 卷积神经网络, 关键词图像识别

Abstract: Aiming at the problems of single font type,small size of recognition data,indistinguishable recognition fields and lack of research on Kazakh and Kirghiz printed character recognition,a multi-font printed Uyghur-Kazakh-Kirghiz keyword recognition method based on convolutional neural network(CNN) is proposed.Firstly,aiming at the problem of lack of Uyghur-Kazakh-Kirghiz printed image corpus,based on image synthesis technique,a Uyghur-Kazakh-Kirghiz keyword image data set including 32 font type is constructed.Secondly,using data augmentation technology to add different level of noise,rotation and distortion effects on these images to further reflect the natural scene features of the data set.Thirdly,using a multi-layer CNN network to train the image recognition model on this data set,and obtaining the recognition accuracy over 96.5%,and the accuracy of about 96% is obtained in the actual print image recognition task including 3 commonly used fonts.This method has fewer pre-proces-sing steps and it outperforms previous recognition approaches within the classical machine learning framework.Experimental results show that the recognition method based on synthetic image data can better realize the task of multi-font printed Uyghur-Kazakh-Kirghiz image recognition.

Key words: Uyghur-Kazakh-Kirghiz, OCR, Image synthesis, Convolutional neural network, Keyword image recognition

中图分类号: 

  • TP391
[1]DOERMANN D.The Indexing and Retrieval of Document Ima-ges:A Survey[J].Computer Vision and Image Understanding,1998,70(3):287-298.
[2]ALAEI F,ALAEI A,BLUMENSTEIN M,et al.A Brief Reviewof Document Image Retrieval Methods:Recent Advances[C]//2016 International Joint Conference on Neural Networks(IJCNN).IEEE,2016:3500-3507.
[3]LI L.Research on document image classification and retrievalmethod based on convolutional neural network[D].Huazhong:Huazhong University of Science and Technology,2017.
[4]NOCE L,GALLO I,ZAMBERLETTI A,et al.Embedded Textual Content for Document Image Classification with Convolutional Neural Networks[C]//Acm Symposium on Document Engineering.ACM,2016:165-173.
[5]DAS A,ROY S,BHATTACHARYA U.Document Image Classifification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks[J].ar-Xiv:1801.09321,2018.
[6]AUDEBERT N,HEROLD C,SLIMANI K,et al.Multimodal Deep Networks for Text and Image-based Document Classification[J].arXiv:1907.06370,2019.
[7]BAGADKAR S L,MALIK L G.Review on Extraction Techniques for Images,Textlines and Keywords From Document Image[C]//IEEE International Conference on Computational Intelligence & Computing Research.IEEE,2015:1-3.
[8]JIANG Y X,DING S C,WU P.A Study on the Classification of Features of Multi-Modal Information Based on BiLSTM-VGG16[J].Information Studies:Theory & Application,2021,44(11):180-186.
[9]HARLEY W A,UFKES A,DERPANIS K G.Evaluation ofDeep Convolutional Nets for Document Image Classification and Retrieval[J].arXiv:1502.07058,2015.
[10]AN Y H,DONG W Z.Research on Segmentation Method of Adhesive Characters based on Recognition Feedback[J].Journal of Hebei Academy of Sciences,2008,25(2):34-38.
[11]SHIN C,DOERMANN D.Structural Similarity for Document Image Classification and Retrieval[J].Pattern Recognition Letters,2014,43(1):119-126.
[12]ZHI T,HUANG W,TONG H.Detecting Text in Natural Image with Connectionist Text Proposal Network[J].arXiv:1609.03605,2015.
[13]RANJAN V,HARIT G,JAWAHAR C V.Enhancing WordImage Retrieval in Presence of Font Variations[C]//Proceedings of the 2014 22nd International Conference on Pattern Recognition.IEEE Computer Society,2014:2709-2714.
[14]CHEN Q,YUAN B S,LI X,et al.Research on Printed Uyghur Character Recognition based on Template Matching[J].Computer Technology and Development,2012,22(4):119-122.
[15]BAI Y H.Printed Uyghur Word Recognition[D].Xi’an:Xi’an University of Electronic Science and Technology,2014.
[16]LANG X.Printed Uyghur Word Recognition based on Segmentation[D].Xi’an:Xi’an University of Electronic Science and Technology,2015.
[17]WANG X D.Research and Application of Key Technologies for Printed Uyghur Character Recognition[D].Xi’an:Xi’an University of Electronic Science and Technology,2017.
[18]YU L,YASIN A.Printed Uyghur Character Recognition Method based on HOG Feature and MLP Classifier[J].Micro-computer Application,2017,33(6):30-33.
[19]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[J].Neural Information Processing Systems,2012,25:1106-1114.
[20]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-time Object Detection with Region Proposal Networks[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2017:1137-1149.
[21]HUBEL D H,WEISEL T N.Receptive Fields,Binocular Intera-ction and Functional Architecture in the Cat’s Visual Cortex[J].The Journal of Physiology,1962,160(1):106-154.
[22]SARDAR P,MIJIT A,ASKAR H.Research on Keyword Extraction of Uyghur-Kazakh Text based on Stem Unit[J].Computer Engineering and Science,2020,42(1):131-137.
[23]CHRIS T,TONY M.Analysis of Convolutional Neural Net-works for Document Image Classifification[J].arXiv:1708-03273,2017.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[3] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[4] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[5] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[6] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[9] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[10] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[11] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[12] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[13] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[14] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[15] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!