Computer Science ›› 2021, Vol. 48 ›› Issue (12): 249-255.doi: 10.11896/jsjkx.200700072

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Distortion Correction Algorithm for Complex Document Image Based on Multi-level TextDetection

KOU Xi-chao1, ZHANG Hong-rui1, FENG Jie2, ZHENG Ya-yu1   

  1. 1 College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
    2 School of Informatics Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China
  • Received:2020-07-13 Revised:2021-01-28 Online:2021-12-15 Published:2021-11-26
  • About author:KOU Xi-chao,born in 1993,postgradua-te.His main research interests include intelligent visual processing and so on.
    ZHENG Ya-yu,born in 1978,Ph.D,associate researcher.His main research interests include embedded system application,computer vision and image processing.
  • Supported by:
    National Natural Science Foundation of China(61501402).

Abstract: Document distortion correction is the basic step of document OCR(optical character recognition),which plays an important role in improving the accuracy of OCR.Document image distortion correction often depends on text extraction.However,most of the current document image correction algorithms cannot accurately locate and analyze the text in complex documents,resulting in unsatisfactory correction effects.To address this problem,a text detection framework based on a fully convolutional network is proposed,and the synthetic document is used to train the network to achieve accurate acquisition of three-level text information of characters,words,and text lines.A self-adaptive sampling of text and three-dimensional modeling of the page using a cubic function will transform the correction problem into a model parameter optimization problem to achieve the purpose of correcting complex document images.Correction experiments using synthetic distortion documents and real test data show that the proposed correction method can accurately extract text from complex documents,significantly improve the visual effect of complex document image correction.Compared with other algorithms,the accuracy rate of OCR after correction significantly increa-ses.

Key words: Convolutional neural network, Document image correction, Optical character recognition, Text detection, Three-dimensional modeling of documents

CLC Number: 

  • TP391
[1]SAMKO O,LAI Y K,MARSHALL D,et al.Virtual unrolling and information recovery from scanned scrolled historical documents[J].Pattern Recognition,2014,47(1):248-259.
[2]HIRANO M,WATANABE Y,ISHIKAWA M.3D rectification of distorted document image based on tiled rectangle fragments[C]//2014 IEEE International Conference on Image Processing (ICIP).IEEE,2014:2604-2608.
[3]YOU S,MATSUSHITA Y,SINHA S,et al.Multiview Rectification of Folded Documents[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,PP(99):505-511.
[4]KOO H I,KIM J,CHO N I.Composition of a Dewarped and Enhanced Document Image from Two View Images[J].IEEE Transactions on Image Processing,2009,18(7):1551-1562.
[5]ZENG F F,WANG X,WU F F.Fast correction method for distorted documents based on text line reconstruction[J].Compu-ter Engineering and Design,2014,35(2):573-577.
[6]BUKHARI S S,SHAFAIT F,BREUEL T M.Coupled snakelets for curled text-line segmentation from warped document images[J].International Journal on Document Analysis and Recognition (IJDAR),2013,16(1):33-53.
[7]SONG L L,WU Y D,SUN B.Improved document image distortion correction method[J].Computer Engineering,2011,37(1):204-206.
[8]ZENG F F,GUO Z D,WANG Z D.Fast correction method for distorted chinese text image based on connected domain[J].Computer Engineering and Design,2015,(5):1251-1255.
[9]MA K,SHU Z,BAI X,et al.DocUNet:Document Image Unwarping via a Stacked U-Net[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2018.
[10]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention.Springer,2015:234-241.
[11]DAS S,MA K,SHU Z,et al.DewarpNet:Single-Image Docu- ment Unwarping with Stacked 3D and 2D Regression Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:131-140
[12]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9365-9374.
[13]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint triplets for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:6569-6578.
[14]VATTI B R.A generic solution to polygon clipping[J].Communications of the ACM,1992,35(7):56-63.
[15]MILLETARI F,NAVAB N,AHMADI S A,et al.Fully convolutional neural networks for volumetric medical image segmentation[C]//Proceedings of the 2016 Fourth International Confe-rence on 3D Vision (3DV).IEEE,2016:565-571.
[16]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2315-2324.
[17]NAYEF N,YIN F,BIZID I,et al.Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).IEEE,2017:1454-1459.
[18]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:761-769.
[19]LEVENSHTEIN V I.Binary codes capable of correcting dele- tions,insertions,and reversals[J].Soviet Physics Doklady,1966,10(8):707-710.
[20]WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9336-9345.
[21]LONG S,RUAN J,ZHANG W,et al.Textsnake:A flexible re- presentation for detecting text of arbitrary shapes[C]//Procee-dings of the European Conference on Computer Vision (ECCV).Springer,2018:20-36.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[5] LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[6] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[7] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[8] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[9] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[10] SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440.
[11] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[12] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[13] ZHAO Zheng-peng, LI Jun-gang, PU Yuan-yuan. Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network [J]. Computer Science, 2022, 49(6): 199-209.
[14] ZHANG Wen-xuan, WU Qin. Fine-grained Image Classification Based on Multi-branch Attention-augmentation [J]. Computer Science, 2022, 49(5): 105-112.
[15] ZHAO Ren-xing, XU Pin-jie, LIU Yao. ECG-based Atrial Fibrillation Detection Based on Deep Convolutional Residual Neural Network [J]. Computer Science, 2022, 49(5): 186-193.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!