计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 249-255.doi: 10.11896/jsjkx.200700072
寇喜超1, 张鸿锐1, 冯杰2, 郑雅羽1
KOU Xi-chao1, ZHANG Hong-rui1, FENG Jie2, ZHENG Ya-yu1
摘要: 文档的扭曲矫正是进行文档OCR(Optical Character Recognition)的基础步骤,对提高OCR的准确率有重要作用。文档图像的扭曲矫正常常依赖于文本的提取,然而目前文档图像矫正算法大都无法对复杂文档中的文本进行准确定位和分析,导致其矫正效果不理想。针对此问题,提出了一种基于全卷积网络的文字检测框架,并使用合成文档对网络进行针对性训练,可实现对字符、词、文本行三级文本信息的准确获取,进而对文本进行自适应采样并利用三次函数对页面进行三维建模,将矫正问题转化为模型参数优化问题,达到矫正复杂文档图像的目的。使用合成扭曲文档以及真实测试数据进行矫正实验,结果表明,提出的矫正方法能够对复杂文档进行精确的文本提取,明显改善了复杂文档图像矫正后的视觉效果,相比于其他算法,该算法矫正后OCR的准确率得到显著提高。
中图分类号:
[1]SAMKO O,LAI Y K,MARSHALL D,et al.Virtual unrolling and information recovery from scanned scrolled historical documents[J].Pattern Recognition,2014,47(1):248-259. [2]HIRANO M,WATANABE Y,ISHIKAWA M.3D rectification of distorted document image based on tiled rectangle fragments[C]//2014 IEEE International Conference on Image Processing (ICIP).IEEE,2014:2604-2608. [3]YOU S,MATSUSHITA Y,SINHA S,et al.Multiview Rectification of Folded Documents[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,PP(99):505-511. [4]KOO H I,KIM J,CHO N I.Composition of a Dewarped and Enhanced Document Image from Two View Images[J].IEEE Transactions on Image Processing,2009,18(7):1551-1562. [5]ZENG F F,WANG X,WU F F.Fast correction method for distorted documents based on text line reconstruction[J].Compu-ter Engineering and Design,2014,35(2):573-577. [6]BUKHARI S S,SHAFAIT F,BREUEL T M.Coupled snakelets for curled text-line segmentation from warped document images[J].International Journal on Document Analysis and Recognition (IJDAR),2013,16(1):33-53. [7]SONG L L,WU Y D,SUN B.Improved document image distortion correction method[J].Computer Engineering,2011,37(1):204-206. [8]ZENG F F,GUO Z D,WANG Z D.Fast correction method for distorted chinese text image based on connected domain[J].Computer Engineering and Design,2015,(5):1251-1255. [9]MA K,SHU Z,BAI X,et al.DocUNet:Document Image Unwarping via a Stacked U-Net[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2018. [10]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention.Springer,2015:234-241. [11]DAS S,MA K,SHU Z,et al.DewarpNet:Single-Image Docu- ment Unwarping with Stacked 3D and 2D Regression Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:131-140 [12]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9365-9374. [13]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint triplets for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:6569-6578. [14]VATTI B R.A generic solution to polygon clipping[J].Communications of the ACM,1992,35(7):56-63. [15]MILLETARI F,NAVAB N,AHMADI S A,et al.Fully convolutional neural networks for volumetric medical image segmentation[C]//Proceedings of the 2016 Fourth International Confe-rence on 3D Vision (3DV).IEEE,2016:565-571. [16]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2315-2324. [17]NAYEF N,YIN F,BIZID I,et al.Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).IEEE,2017:1454-1459. [18]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:761-769. [19]LEVENSHTEIN V I.Binary codes capable of correcting dele- tions,insertions,and reversals[J].Soviet Physics Doklady,1966,10(8):707-710. [20]WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9336-9345. [21]LONG S,RUAN J,ZHANG W,et al.Textsnake:A flexible re- presentation for detecting text of arbitrary shapes[C]//Procee-dings of the European Conference on Computer Vision (ECCV).Springer,2018:20-36. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[7] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[8] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[9] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[10] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[11] | 杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236 |
[12] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 |
[13] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
[14] | 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210 |
[15] | 孙洁琪, 李亚峰, 张文博, 刘鹏辉. 基于离散小波变换的双域特征融合深度卷积神经网络 Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation 计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199 |
|