基于多级文本检测的复杂文档图像扭曲矫正算法

doi:10.11896/jsjkx.200700072

Abstract

Abstract: Document distortion correction is the basic step of document OCR(optical character recognition),which plays an important role in improving the accuracy of OCR.Document image distortion correction often depends on text extraction.However,most of the current document image correction algorithms cannot accurately locate and analyze the text in complex documents,resulting in unsatisfactory correction effects.To address this problem,a text detection framework based on a fully convolutional network is proposed,and the synthetic document is used to train the network to achieve accurate acquisition of three-level text information of characters,words,and text lines.A self-adaptive sampling of text and three-dimensional modeling of the page using a cubic function will transform the correction problem into a model parameter optimization problem to achieve the purpose of correcting complex document images.Correction experiments using synthetic distortion documents and real test data show that the proposed correction method can accurately extract text from complex documents,significantly improve the visual effect of complex document image correction.Compared with other algorithms,the accuracy rate of OCR after correction significantly increa-ses.

Key words: Convolutional neural network, Document image correction, Optical character recognition, Text detection, Three-dimensional modeling of documents

CLC Number:

TP391

KOU Xi-chao, ZHANG Hong-rui, FENG Jie, ZHENG Ya-yu. Distortion Correction Algorithm for Complex Document Image Based on Multi-level TextDetection[J].Computer Science, 2021, 48(12): 249-255.

References

[1]SAMKO O,LAI Y K,MARSHALL D,et al.Virtual unrolling and information recovery from scanned scrolled historical documents[J].Pattern Recognition,2014,47(1):248-259.
[2]HIRANO M,WATANABE Y,ISHIKAWA M.3D rectification of distorted document image based on tiled rectangle fragments[C]//2014 IEEE International Conference on Image Processing (ICIP).IEEE,2014:2604-2608.
[3]YOU S,MATSUSHITA Y,SINHA S,et al.Multiview Rectification of Folded Documents[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,PP(99):505-511.
[4]KOO H I,KIM J,CHO N I.Composition of a Dewarped and Enhanced Document Image from Two View Images[J].IEEE Transactions on Image Processing,2009,18(7):1551-1562.
[5]ZENG F F,WANG X,WU F F.Fast correction method for distorted documents based on text line reconstruction[J].Compu-ter Engineering and Design,2014,35(2):573-577.
[6]BUKHARI S S,SHAFAIT F,BREUEL T M.Coupled snakelets for curled text-line segmentation from warped document images[J].International Journal on Document Analysis and Recognition (IJDAR),2013,16(1):33-53.
[7]SONG L L,WU Y D,SUN B.Improved document image distortion correction method[J].Computer Engineering,2011,37(1):204-206.
[8]ZENG F F,GUO Z D,WANG Z D.Fast correction method for distorted chinese text image based on connected domain[J].Computer Engineering and Design,2015,(5):1251-1255.
[9]MA K,SHU Z,BAI X,et al.DocUNet:Document Image Unwarping via a Stacked U-Net[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2018.
[10]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention.Springer,2015:234-241.
[11]DAS S,MA K,SHU Z,et al.DewarpNet:Single-Image Docu- ment Unwarping with Stacked 3D and 2D Regression Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:131-140
[12]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9365-9374.
[13]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint triplets for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:6569-6578.
[14]VATTI B R.A generic solution to polygon clipping[J].Communications of the ACM,1992,35(7):56-63.
[15]MILLETARI F,NAVAB N,AHMADI S A,et al.Fully convolutional neural networks for volumetric medical image segmentation[C]//Proceedings of the 2016 Fourth International Confe-rence on 3D Vision (3DV).IEEE,2016:565-571.
[16]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2315-2324.
[17]NAYEF N,YIN F,BIZID I,et al.Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).IEEE,2017:1454-1459.
[18]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:761-769.
[19]LEVENSHTEIN V I.Binary codes capable of correcting dele- tions,insertions,and reversals[J].Soviet Physics Doklady,1966,10(8):707-710.
[20]WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9336-9345.
[21]LONG S,RUAN J,ZHANG W,et al.Textsnake:A flexible re- presentation for detecting text of arbitrary shapes[C]//Procee-dings of the European Conference on Computer Vision (ECCV).Springer,2018:20-36.

Related Articles 15

[1]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2]	CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3]	ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4]	DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[5]	LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[6]	XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[7]	WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[8]	ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[9]	WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[10]	SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440.
[11]	YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[12]	YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[13]	ZHAO Zheng-peng, LI Jun-gang, PU Yuan-yuan. Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network [J]. Computer Science, 2022, 49(6): 199-209.
[14]	ZHANG Wen-xuan, WU Qin. Fine-grained Image Classification Based on Multi-branch Attention-augmentation [J]. Computer Science, 2022, 49(5): 105-112.
[15]	ZHAO Ren-xing, XU Pin-jie, LIU Yao. ECG-based Atrial Fibrillation Detection Based on Deep Convolutional Residual Neural Network [J]. Computer Science, 2022, 49(5): 186-193.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Distortion Correction Algorithm for Complex Document Image Based on Multi-level TextDetection

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0