计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 139-145.doi: 10.11896/jsjkx.220600032

• 计算机图形学&多媒体 • 上一篇    下一篇

基于文本行匹配的跨图文本阅读方法

戴禹, 许林峰   

  1. 电子科技大学信息与通信工程学院 成都 611731
  • 收稿日期:2022-06-02 修回日期:2022-07-08 出版日期:2022-09-15 发布日期:2022-09-09
  • 通讯作者: 许林峰(lfxu@uestc.edu.cn)
  • 作者简介:(ydai@std.uestc.edu.cn)
  • 基金资助:
    国家自然科学基金(62071086);四川省科技计划(2021YFG0296);四川省科技创新(苗子工程)培育及小发明小创造项目(2021015)

Cross-image Text Reading Method Based on Text Line Matching

DAI Yu, XU Lin-feng   

  1. School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China
  • Received:2022-06-02 Revised:2022-07-08 Online:2022-09-15 Published:2022-09-09
  • About author:DAI Yu,born in 1998,Ph.D.His main research interests include text detection and text recognition.
    XU Lin-feng,born in 1976,Ph.D,asso-ciate professor.His main research in-terests include visual attention,saliency detection,image and video coding,visual signal processing,and multimedia communication system.
  • Supported by:
    National Natural Science Foundation of China(62071086),Sichuan Science and Technology Program(2021YFG0296) and Science and Technology Innovation(Seedling Project) Cultivation and Invention Creation Project in Sichuan Province(2021015).

摘要: 通过摄像头阅读文本可帮助计算机理解文本内容。然而,由于摄像头视野的局限性和中文文本识别的复杂性,计算机有时很难通过摄像头从单张文本图像获取完整的文本内容,因此定义了跨图文本阅读任务,旨在从一对具有重叠区域的文本图像中获取完整的文本内容。针对跨图文本阅读任务,提出了基于文本行匹配的跨图文本阅读方法。首先采用文本检测网络来裁剪文本行,然后设计了基于多头自注意力机制的文本行匹配网络来预测文本行的匹配关系,最后提出了基于编辑的文本阅读网络,以去除重叠文本并读取文本内容。为了训练和评估跨图文本阅读方法,构造了跨图中文文本阅读数据集(Cross-image Chinese Text Reading Dataset,CCTR)。在CCTR数据集上进行实验,结果表明,相比像素级拼接和识别方法,所提方法能够得到更高的阅读性能,验证了其优越性。

关键词: 跨图文本阅读, 跨图中文文本阅读数据集, 文本行匹配, 基于编辑的文本阅读, 注意力机制

Abstract: Reading text with a camera can help the computer understand the text content.However,due to the limited field of view of the camera and the complexity of Chinese text recognition,it is sometimes difficult for the computer to read complete text content from a single text image with the camera.Thus,we define the cross-image text reading task,which aims to read the complete text content of a pair of overlapping text images.For the cross-image text reading task,we propose the cross-image text reading method via text line matching.We first adopt a text detection network to crop text lines.Then,we design the text line matching network with the multi-head self-attention mechanism to predict the matching relationships of text lines.Finally,the editing-based text reading network is proposed to remove overlapping texts and read complete text content.We also construct the cross-image Chinese text reading(CCTR) dataset for training and evaluation.Experiment results on CCTR dataset demonstrate that the proposed method achieves higher reading performance than the pixel-level stitching and recognition methods,which proves the superiority of the proposed method.

Key words: Cross-image text reading, Cross-image Chinese text reading dataset, Text line matching, Editing-based text reading, Attention mechanism

中图分类号: 

  • TP391
[1]LIAO M H,WAN Z Y,YAO C,et al.Real-time scene text detection with differentiable binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11474-11481.
[2]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[3]BAY H,TUYTELAARS T,GOOL L V.Surf:speeded up ro-bust features[C]//Proceedings of the European Conference on Computer Vision.2006:404-417.
[4]RUBLEE E,RABAUD V,KONOLIGE K,et al.Orb:an effi-cient alternative to sift or surf[C]//Proceedings of the International Conference on Computer Vision.2011:2564-2571.
[5]LEUTENEGGER S,CHLI M,SIEGWART R Y.Brisk:binary robust invariant scalable keypoints[C]//Proceedings of the International Conference on Computer Vision.2011:2548-2555.
[6]RAGURAM R,FRAHM J M,POLLEFEYS M.A comparative analysis of ransac techniques leading to adaptive real-time random sample consensus[C]//Proceedings of the European Conference on Computer Vision.2008:500-513.
[7]BIAN J W,LIN W Y,MATSUSHITA Y,et al.Gms:grid-based motion statistics for fast,ultra-robust feature correspondence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2828-2837.
[8]DETONE D,MALISIEWICZ T,RABINOVICH A.Superpoint:self-supervised interest point detection and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:337-349.
[9]DUSMANU M,ROCCO I,PAJDLA T,et al.D2-net:a trainable cnn for joint description and detection of local features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:8084-8093.
[10]SARLIN P E,CADENA C,SIEGWART R,et al.From coarse to fine:robust hierarchical localization at large scale[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12716-12725.
[11]ZHANG J H,SUN D W,LUO Z X,et al.Learning two-viewcorrespondences and geometry using order-aware network[C]//Proceedings of the International Conference on Computer Vision.2019:5845-5854.
[12]SARLIN P E,DETONE D,MALISIEWICZ T,et al.Superglue:learning feature matching with graph neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:4938-4947.
[13]ROCCO I,ARANDJELOVIC R,SIVIC J.Efficient neighbour-hood consensus networks via submanifold sparse convolutions[C]//Proceedings of the European Conference on Computer Vision.2020:605-621.
[14]GRAHAM B,ENGELCKE M,VAN DER MAATEN L.3d semantic segmentation with submanifold sparse convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9224-9232.
[15]ROCCO I,CIMPOI M,ARANDJELOVIC R,et al.Neighbourhood consensus networks[C]//Proceedings of the Conference on Neural Information Processing Systems.2018:1651-1662.
[16]GRAVES A,FERNANDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2006:369-376.
[17]ZHANG Z,TANG Z M,WANG Y,et al.Dense Residual Network:Enhancing Global Dense Feature Flow for Character Reco-gnition[J].Neural Networks,2021,139:77-85.
[18]ZHANG Z,TANG Z M,ZHANG Z,et al.Fully-Convolutional Intensive Feature Flow Neural Network for Text Recognition[C]//Proceedings of the European Conference on Artificial Intelligence.2020:1706-1713.
[19]TANG Z M,JIANG W M,ZHANG Z,et al.DenseNet with Up-Sampling block for recognizing texts in images[J].Neural Computing and Applications,2019,32(11):7553-7561.
[20]LEE C Y,OSINDERO S.Recursive recurrent nets with attention modeling for ocr in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2231-2239.
[21]GAO Y Z,CHEN Y Y,WANG J Q,et al.Reading scene text with fully convolutional sequence modeling[J].Neurocompu-ting,2019,339:161-170.
[22]HU W Y,CAI X C,HOU J,et al.Gtc:guided training of ctc towards efficient and accurate scene text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11005-11012.
[23]WANG T W,ZHU Y Z,JIN L W,et al.Decoupled attention network for text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12216-12224.
[24]YUE X Y,KUANG Z H,LIN C H,et al.Robustscanner:dy-namically enhancing positional clues for robust text recognition[C]//Proceedings of the European Conference on Computer Vision.2020:135-151.
[25]WANG Y X,XIE H T,FANG S C,et al.From two to one:a new scene text recognizer with visual language modeling network[C]//Proceedings of the IEEE International Conference on Computer Vision.2021:14194-14203.
[26]BAEK J,KIM G,LEE J,et al.What is wrong with scene text recognition model comparisons? dataset and model analysis[C]//Proceedings of the International Conference on Computer Vision.2019:4715-4723.
[27]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the Conference on Neural Information Processing Systems.2017:5998-6008.
[28]CUTURI M.Sinkhorn distances:lightspeed computation of optimal transport[C]//Proceedings of the Conference on Neural Information Processing Systems.2013:2292-2300.
[29]MALMI E,KRAUSE S,ROTHE S,et al.Encode,tag,realize:high-precision text editing[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2019:5053-5064.
[30]DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of North American Chapter of the Association for Computational Linguistics.2018:4171-4186.
[31]SHI B G,YAO C,LIAO M H,et al.Icdar2017 competition on reading Chinese text in the wild (rctw-17)[C]//Proceedings of the International Conference on Document Analysis and Recognition.2017:1429-1434.
[32]ZHOU X Y,YAO C,WEN H,et al.East:an efficient and accurate scene text detector[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2642-2651.
[33]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[34]SHI B G,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(11):2298-2304.
[35]KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.Icdar 2015 competition on robust reading[C]//Proceedings of the International Conference on Document Analysis and Recognition.2015:1156-1160.
[36]REVAUD J,DE SOUZA C,HUMENBERGER M,et al.R2d2:reliable and repeatable detector and descriptor[C]//Proceedings of the Conference on Neural Information Processing Systems.2019:12405-12415.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[3] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[4] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[5] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[6] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[7] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[8] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[10] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[11] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[12] 彭双, 伍江江, 陈浩, 杜春, 李军.
基于注意力神经网络的对地观测卫星星上自主任务规划方法
Satellite Onboard Observation Task Planning Based on Attention Neural Network
计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093
[13] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[14] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[15] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!