计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211100038-6.doi: 10.11896/jsjkx.211100038
沙尔旦尔·帕尔哈提, 阿布都热合曼·卡的尔, 阿力木江·亚森
SARDAR Parhat, ABDURAHMAN Kadir, ALIMJAN Yasin
摘要: 针对印刷体维吾尔文字识别中字体单一、识别数据规模小、识别领域不区分以及哈萨克和柯尔克孜文字识别研究缺乏等问题,提出了基于卷积神经网络(CNN)的多字体印刷体维吾尔、哈萨克和柯尔克孜(以下简称维-哈-柯)文关键词识别方法。首先,针对维-哈-柯文关键词图像语料库缺乏的问题,基于图像合成技术构建包括32种字体的维-哈-柯文关键词图像数据集。然后,使用数据扩充技术对数据集的图像进行不同程度的加噪、旋转和失真操作,来进一步体现数据集的自然场景特征。最后,使用多层CNN网络在该数据集上训练图像识别模型,均得到了96.5%以上的识别准确率,并在包括3种常用字体的实际印刷体图像识别任务中得到了96%左右的准确率,该方法减少了预处理过程,并胜过了以往机器学习框架下的其他识别方法。实验结果表明,在CNN网络框架下基于合成图像和数据扩充技术的识别方法能够较好地实现多字体印刷体维-哈-柯文图像识别任务。
中图分类号:
[1]DOERMANN D.The Indexing and Retrieval of Document Ima-ges:A Survey[J].Computer Vision and Image Understanding,1998,70(3):287-298. [2]ALAEI F,ALAEI A,BLUMENSTEIN M,et al.A Brief Reviewof Document Image Retrieval Methods:Recent Advances[C]//2016 International Joint Conference on Neural Networks(IJCNN).IEEE,2016:3500-3507. [3]LI L.Research on document image classification and retrievalmethod based on convolutional neural network[D].Huazhong:Huazhong University of Science and Technology,2017. [4]NOCE L,GALLO I,ZAMBERLETTI A,et al.Embedded Textual Content for Document Image Classification with Convolutional Neural Networks[C]//Acm Symposium on Document Engineering.ACM,2016:165-173. [5]DAS A,ROY S,BHATTACHARYA U.Document Image Classifification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks[J].ar-Xiv:1801.09321,2018. [6]AUDEBERT N,HEROLD C,SLIMANI K,et al.Multimodal Deep Networks for Text and Image-based Document Classification[J].arXiv:1907.06370,2019. [7]BAGADKAR S L,MALIK L G.Review on Extraction Techniques for Images,Textlines and Keywords From Document Image[C]//IEEE International Conference on Computational Intelligence & Computing Research.IEEE,2015:1-3. [8]JIANG Y X,DING S C,WU P.A Study on the Classification of Features of Multi-Modal Information Based on BiLSTM-VGG16[J].Information Studies:Theory & Application,2021,44(11):180-186. [9]HARLEY W A,UFKES A,DERPANIS K G.Evaluation ofDeep Convolutional Nets for Document Image Classification and Retrieval[J].arXiv:1502.07058,2015. [10]AN Y H,DONG W Z.Research on Segmentation Method of Adhesive Characters based on Recognition Feedback[J].Journal of Hebei Academy of Sciences,2008,25(2):34-38. [11]SHIN C,DOERMANN D.Structural Similarity for Document Image Classification and Retrieval[J].Pattern Recognition Letters,2014,43(1):119-126. [12]ZHI T,HUANG W,TONG H.Detecting Text in Natural Image with Connectionist Text Proposal Network[J].arXiv:1609.03605,2015. [13]RANJAN V,HARIT G,JAWAHAR C V.Enhancing WordImage Retrieval in Presence of Font Variations[C]//Proceedings of the 2014 22nd International Conference on Pattern Recognition.IEEE Computer Society,2014:2709-2714. [14]CHEN Q,YUAN B S,LI X,et al.Research on Printed Uyghur Character Recognition based on Template Matching[J].Computer Technology and Development,2012,22(4):119-122. [15]BAI Y H.Printed Uyghur Word Recognition[D].Xi’an:Xi’an University of Electronic Science and Technology,2014. [16]LANG X.Printed Uyghur Word Recognition based on Segmentation[D].Xi’an:Xi’an University of Electronic Science and Technology,2015. [17]WANG X D.Research and Application of Key Technologies for Printed Uyghur Character Recognition[D].Xi’an:Xi’an University of Electronic Science and Technology,2017. [18]YU L,YASIN A.Printed Uyghur Character Recognition Method based on HOG Feature and MLP Classifier[J].Micro-computer Application,2017,33(6):30-33. [19]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[J].Neural Information Processing Systems,2012,25:1106-1114. [20]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-time Object Detection with Region Proposal Networks[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2017:1137-1149. [21]HUBEL D H,WEISEL T N.Receptive Fields,Binocular Intera-ction and Functional Architecture in the Cat’s Visual Cortex[J].The Journal of Physiology,1962,160(1):106-154. [22]SARDAR P,MIJIT A,ASKAR H.Research on Keyword Extraction of Uyghur-Kazakh Text based on Stem Unit[J].Computer Engineering and Science,2020,42(1):131-137. [23]CHRIS T,TONY M.Analysis of Convolutional Neural Net-works for Document Image Classifification[J].arXiv:1708-03273,2017. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[3] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[4] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[5] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[6] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[7] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[8] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[9] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[10] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[11] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[12] | 吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039 |
[13] | 杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236 |
[14] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 |
[15] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
|