计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 249-255.doi: 10.11896/jsjkx.200700072

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多级文本检测的复杂文档图像扭曲矫正算法

寇喜超1, 张鸿锐1, 冯杰2, 郑雅羽1   

  1. 1 浙江工业大学信息工程学院 杭州310023
    2 浙江理工大学信息学院 杭州310018
  • 收稿日期:2020-07-13 修回日期:2021-01-28 发布日期:2021-11-26
  • 通讯作者: 郑雅羽(yayuzheng@zjut.edu.cn)
  • 作者简介:1053556755@qq.com
  • 基金资助:
    国家自然科学基金(61501402)

Distortion Correction Algorithm for Complex Document Image Based on Multi-level TextDetection

KOU Xi-chao1, ZHANG Hong-rui1, FENG Jie2, ZHENG Ya-yu1   

  1. 1 College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
    2 School of Informatics Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China
  • Received:2020-07-13 Revised:2021-01-28 Published:2021-11-26
  • About author:KOU Xi-chao,born in 1993,postgradua-te.His main research interests include intelligent visual processing and so on.
    ZHENG Ya-yu,born in 1978,Ph.D,associate researcher.His main research interests include embedded system application,computer vision and image processing.
  • Supported by:
    National Natural Science Foundation of China(61501402).

摘要: 文档的扭曲矫正是进行文档OCR(Optical Character Recognition)的基础步骤,对提高OCR的准确率有重要作用。文档图像的扭曲矫正常常依赖于文本的提取,然而目前文档图像矫正算法大都无法对复杂文档中的文本进行准确定位和分析,导致其矫正效果不理想。针对此问题,提出了一种基于全卷积网络的文字检测框架,并使用合成文档对网络进行针对性训练,可实现对字符、词、文本行三级文本信息的准确获取,进而对文本进行自适应采样并利用三次函数对页面进行三维建模,将矫正问题转化为模型参数优化问题,达到矫正复杂文档图像的目的。使用合成扭曲文档以及真实测试数据进行矫正实验,结果表明,提出的矫正方法能够对复杂文档进行精确的文本提取,明显改善了复杂文档图像矫正后的视觉效果,相比于其他算法,该算法矫正后OCR的准确率得到显著提高。

关键词: 卷积神经网络, 文本检测, 文档三维建模, 文档图像矫正, 光学字符识别

Abstract: Document distortion correction is the basic step of document OCR(optical character recognition),which plays an important role in improving the accuracy of OCR.Document image distortion correction often depends on text extraction.However,most of the current document image correction algorithms cannot accurately locate and analyze the text in complex documents,resulting in unsatisfactory correction effects.To address this problem,a text detection framework based on a fully convolutional network is proposed,and the synthetic document is used to train the network to achieve accurate acquisition of three-level text information of characters,words,and text lines.A self-adaptive sampling of text and three-dimensional modeling of the page using a cubic function will transform the correction problem into a model parameter optimization problem to achieve the purpose of correcting complex document images.Correction experiments using synthetic distortion documents and real test data show that the proposed correction method can accurately extract text from complex documents,significantly improve the visual effect of complex document image correction.Compared with other algorithms,the accuracy rate of OCR after correction significantly increa-ses.

Key words: Convolutional neural network, Text detection, Three-dimensional modeling of documents, Document image correction, Optical character recognition

中图分类号: 

  • TP391
[1]SAMKO O,LAI Y K,MARSHALL D,et al.Virtual unrolling and information recovery from scanned scrolled historical documents[J].Pattern Recognition,2014,47(1):248-259.
[2]HIRANO M,WATANABE Y,ISHIKAWA M.3D rectification of distorted document image based on tiled rectangle fragments[C]//2014 IEEE International Conference on Image Processing (ICIP).IEEE,2014:2604-2608.
[3]YOU S,MATSUSHITA Y,SINHA S,et al.Multiview Rectification of Folded Documents[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,PP(99):505-511.
[4]KOO H I,KIM J,CHO N I.Composition of a Dewarped and Enhanced Document Image from Two View Images[J].IEEE Transactions on Image Processing,2009,18(7):1551-1562.
[5]ZENG F F,WANG X,WU F F.Fast correction method for distorted documents based on text line reconstruction[J].Compu-ter Engineering and Design,2014,35(2):573-577.
[6]BUKHARI S S,SHAFAIT F,BREUEL T M.Coupled snakelets for curled text-line segmentation from warped document images[J].International Journal on Document Analysis and Recognition (IJDAR),2013,16(1):33-53.
[7]SONG L L,WU Y D,SUN B.Improved document image distortion correction method[J].Computer Engineering,2011,37(1):204-206.
[8]ZENG F F,GUO Z D,WANG Z D.Fast correction method for distorted chinese text image based on connected domain[J].Computer Engineering and Design,2015,(5):1251-1255.
[9]MA K,SHU Z,BAI X,et al.DocUNet:Document Image Unwarping via a Stacked U-Net[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2018.
[10]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention.Springer,2015:234-241.
[11]DAS S,MA K,SHU Z,et al.DewarpNet:Single-Image Docu- ment Unwarping with Stacked 3D and 2D Regression Networks[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:131-140
[12]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9365-9374.
[13]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint triplets for object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019:6569-6578.
[14]VATTI B R.A generic solution to polygon clipping[J].Communications of the ACM,1992,35(7):56-63.
[15]MILLETARI F,NAVAB N,AHMADI S A,et al.Fully convolutional neural networks for volumetric medical image segmentation[C]//Proceedings of the 2016 Fourth International Confe-rence on 3D Vision (3DV).IEEE,2016:565-571.
[16]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2315-2324.
[17]NAYEF N,YIN F,BIZID I,et al.Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt[C]//2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).IEEE,2017:1454-1459.
[18]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training region-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:761-769.
[19]LEVENSHTEIN V I.Binary codes capable of correcting dele- tions,insertions,and reversals[J].Soviet Physics Doklady,1966,10(8):707-710.
[20]WANG W,XIE E,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:9336-9345.
[21]LONG S,RUAN J,ZHANG W,et al.Textsnake:A flexible re- presentation for detecting text of arbitrary shapes[C]//Procee-dings of the European Conference on Computer Vision (ECCV).Springer,2018:20-36.
[1] 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法[J]. 计算机科学, 2021, 48(9): 77-85.
[2] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[3] 王乐, 杨晓敏. 基于感知损失的遥感图像全色锐化反馈网络[J]. 计算机科学, 2021, 48(8): 91-98.
[4] 王炽, 常俊. 基于3D卷积神经网络的CSI跨场景手势识别方法[J]. 计算机科学, 2021, 48(8): 322-327.
[5] 程松盛, 潘金山. 基于深度学习特征匹配的视频超分辨率方法[J]. 计算机科学, 2021, 48(7): 184-189.
[6] 王栋, 周大可, 黄有达, 杨欣. 基于多尺度多粒度特征的行人重识别[J]. 计算机科学, 2021, 48(7): 238-244.
[7] 熊朝阳, 王婷. 基于卷积神经网络的建筑构件图像识别[J]. 计算机科学, 2021, 48(6A): 51-56.
[8] 胡京徽, 许鹏. 一种基于图像分类的航空紧固件产品自动分类方法[J]. 计算机科学, 2021, 48(6A): 63-66.
[9] 和青芳, 王慧, 程光. 自适应小数据集乳腺癌病理组织分类研究[J]. 计算机科学, 2021, 48(6A): 67-73.
[10] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法[J]. 计算机科学, 2021, 48(6A): 95-100.
[11] 王建明, 黎向锋, 叶磊, 左敦稳, 张丽萍. 基于信道注意结构的生成对抗网络医学图像去模糊[J]. 计算机科学, 2021, 48(6A): 101-106.
[12] 韩斌, 曾松伟. 基于多特征融合和卷积神经网络的植物叶片识别[J]. 计算机科学, 2021, 48(6A): 113-117.
[13] 余晗青, 杨贞, 殷志坚. 基于区域激活策略的Tiny YOLOv3目标检测算法[J]. 计算机科学, 2021, 48(6A): 118-121.
[14] 刘吉华, 张梦迪, 彭红霞, 贾兴平. 基于卷积神经网络的汽车销量预测模型[J]. 计算机科学, 2021, 48(6A): 178-183.
[15] 陈扬, 王金亮, 夏炜, 杨颢, 朱润, 奚雪峰. 基于特征自动提取的足迹图像聚类方法[J]. 计算机科学, 2021, 48(6A): 255-259.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 蔡婷,陈昌志. 云环境下基于UCON的访问控制模型研究[J]. 计算机科学, 2014, 41(Z6): 262 -264 .
[2] 吴昊昊, 王方石. 多尺度膨胀卷积在图像分类中的应用[J]. 计算机科学, 2020, 47(6A): 166 -171 .
[3] 王教金, 蹇木伟, 刘翔宇, 林培光, 耿蕾蕾, 崔超然, 尹义龙. 基于3D全时序卷积神经网络的视频显著性检测[J]. 计算机科学, 2020, 47(8): 195 -201 .
[4] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[5] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[6] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .
[7] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[8] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .
[9] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述[J]. 计算机科学, 2021, 48(11): 4 -11 .
[10] 李玉, 段宏岳, 殷昱煜, 高洪皓. 基于区块链的去中心化众包技术综述[J]. 计算机科学, 2021, 48(11): 12 -27 .