计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 249-255.doi: 10.11896/jsjkx.200700072

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多级文本检测的复杂文档图像扭曲矫正算法

寇喜超1, 张鸿锐1, 冯杰2, 郑雅羽1   

  1. 1 浙江工业大学信息工程学院 杭州310023
    2 浙江理工大学信息学院 杭州310018
  • 收稿日期:2020-07-13 修回日期:2021-01-28 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 郑雅羽(yayuzheng@zjut.edu.cn)
  • 作者简介:1053556755@qq.com
  • 基金资助:
    国家自然科学基金(61501402)

Distortion Correction Algorithm for Complex Document Image Based on Multi-level TextDetection

KOU Xi-chao1, ZHANG Hong-rui1, FENG Jie2, ZHENG Ya-yu1   

  1. 1 College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
    2 School of Informatics Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China
  • Received:2020-07-13 Revised:2021-01-28 Online:2021-12-15 Published:2021-11-26
  • About author:KOU Xi-chao,born in 1993,postgradua-te.His main research interests include intelligent visual processing and so on.
    ZHENG Ya-yu,born in 1978,Ph.D,associate researcher.His main research interests include embedded system application,computer vision and image processing.
  • Supported by:
    National Natural Science Foundation of China(61501402).

摘要: 文档的扭曲矫正是进行文档OCR(Optical Character Recognition)的基础步骤,对提高OCR的准确率有重要作用。文档图像的扭曲矫正常常依赖于文本的提取,然而目前文档图像矫正算法大都无法对复杂文档中的文本进行准确定位和分析,导致其矫正效果不理想。针对此问题,提出了一种基于全卷积网络的文字检测框架,并使用合成文档对网络进行针对性训练,可实现对字符、词、文本行三级文本信息的准确获取,进而对文本进行自适应采样并利用三次函数对页面进行三维建模,将矫正问题转化为模型参数优化问题,达到矫正复杂文档图像的目的。使用合成扭曲文档以及真实测试数据进行矫正实验,结果表明,提出的矫正方法能够对复杂文档进行精确的文本提取,明显改善了复杂文档图像矫正后的视觉效果,相比于其他算法,该算法矫正后OCR的准确率得到显著提高。

关键词: 光学字符识别, 卷积神经网络, 文本检测, 文档三维建模, 文档图像矫正

Abstract: Document distortion correction is the basic step of document OCR(optical character recognition),which plays an important role in improving the accuracy of OCR.Document image distortion correction often depends on text extraction.However,most of the current document image correction algorithms cannot accurately locate and analyze the text in complex documents,resulting in unsatisfactory correction effects.To address this problem,a text detection framework based on a fully convolutional network is proposed,and the synthetic document is used to train the network to achieve accurate acquisition of three-level text information of characters,words,and text lines.A self-adaptive sampling of text and three-dimensional modeling of the page using a cubic function will transform the correction problem into a model parameter optimization problem to achieve the purpose of correcting complex document images.Correction experiments using synthetic distortion documents and real test data show that the proposed correction method can accurately extract text from complex documents,significantly improve the visual effect of complex document image correction.Compared with other algorithms,the accuracy rate of OCR after correction significantly increa-ses.

Key words: Convolutional neural network, Document image correction, Optical character recognition, Text detection, Three-dimensional modeling of documents

中图分类号: 

  • TP391
[1]SAMKO O,LAI Y K,MARSHALL D,et al.Virtual unrolling and information recovery from scanned scrolled historical documents[J].Pattern Recognition,2014,47(1):248-259.
[2]HIRANO M,WATANABE Y,ISHIKAWA M.3D rectification of distorted document image based on tiled rectangle fragments[C]//2014 IEEE International Conference on Image Processing (ICIP).IEEE,2014:2604-2608.
[3]YOU S,MATSUSHITA Y,SINHA S,et al.Multiview Rectification of Folded Documents[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,PP(99):505-511.
[4]KOO H I,KIM J,CHO N I.Composition of a Dewarped and Enhanced Document Image from Two View Images[J].IEEE Transactions on Image Processing,2009,18(7):1551-1562.
[5]ZENG F F,WANG X,WU F F.Fast correction method for distorted documents based on text line reconstruction[J].Compu-ter Engineering and Design,2014,35(2):573-577.
[6]BUKHARI S S,SHAFAIT F,BREUEL T M.Coupled snakelets for curled text-line segmentation from warped document images[J].International Journal on Document Analysis and Recognition (IJDAR),2013,16(1):33-53.
[7]SONG L L,WU Y D,SUN B.Improved document image distortion correction method[J].Computer Engineering,2011,37(1):204-206.
[8]ZENG F F,GUO Z D,WANG Z D.Fast correction method for distorted chinese text image based on connected domain[J].Computer Engineering and Design,2015,(5):1251-1255.
[9]MA K,SHU Z,BAI X,et al.DocUNet:Document Image Unwarping via a Stacked U-Net[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2018.
[10]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention.Springer,2015:234-241.
[11]DAS S,MA K,SHU Z,et al.DewarpNet:Single-Image Docu- ment Unwarping with Stacked 3D and 2D Regression Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:131-140
[12]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9365-9374.
[13]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint triplets for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:6569-6578.
[14]VATTI B R.A generic solution to polygon clipping[J].Communications of the ACM,1992,35(7):56-63.
[15]MILLETARI F,NAVAB N,AHMADI S A,et al.Fully convolutional neural networks for volumetric medical image segmentation[C]//Proceedings of the 2016 Fourth International Confe-rence on 3D Vision (3DV).IEEE,2016:565-571.
[16]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2315-2324.
[17]NAYEF N,YIN F,BIZID I,et al.Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).IEEE,2017:1454-1459.
[18]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:761-769.
[19]LEVENSHTEIN V I.Binary codes capable of correcting dele- tions,insertions,and reversals[J].Soviet Physics Doklady,1966,10(8):707-710.
[20]WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9336-9345.
[21]LONG S,RUAN J,ZHANG W,et al.Textsnake:A flexible re- presentation for detecting text of arbitrary shapes[C]//Procee-dings of the European Conference on Computer Vision (ECCV).Springer,2018:20-36.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[9] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[10] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[11] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[12] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[13] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[14] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
[15] 孙洁琪, 李亚峰, 张文博, 刘鹏辉.
基于离散小波变换的双域特征融合深度卷积神经网络
Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation
计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!