基于文本行匹配的跨图文本阅读方法

doi:10.11896/jsjkx.220600032

Abstract

Abstract: Reading text with a camera can help the computer understand the text content.However,due to the limited field of view of the camera and the complexity of Chinese text recognition,it is sometimes difficult for the computer to read complete text content from a single text image with the camera.Thus,we define the cross-image text reading task,which aims to read the complete text content of a pair of overlapping text images.For the cross-image text reading task,we propose the cross-image text reading method via text line matching.We first adopt a text detection network to crop text lines.Then,we design the text line matching network with the multi-head self-attention mechanism to predict the matching relationships of text lines.Finally,the editing-based text reading network is proposed to remove overlapping texts and read complete text content.We also construct the cross-image Chinese text reading(CCTR) dataset for training and evaluation.Experiment results on CCTR dataset demonstrate that the proposed method achieves higher reading performance than the pixel-level stitching and recognition methods,which proves the superiority of the proposed method.

Key words: Cross-image text reading, Cross-image Chinese text reading dataset, Text line matching, Editing-based text reading, Attention mechanism

CLC Number:

TP391

DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching[J].Computer Science, 2022, 49(9): 139-145.

References

[1]LIAO M H,WAN Z Y,YAO C,et al.Real-time scene text detection with differentiable binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11474-11481.
[2]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[3]BAY H,TUYTELAARS T,GOOL L V.Surf:speeded up ro-bust features[C]//Proceedings of the European Conference on Computer Vision.2006:404-417.
[4]RUBLEE E,RABAUD V,KONOLIGE K,et al.Orb:an effi-cient alternative to sift or surf[C]//Proceedings of the International Conference on Computer Vision.2011:2564-2571.
[5]LEUTENEGGER S,CHLI M,SIEGWART R Y.Brisk:binary robust invariant scalable keypoints[C]//Proceedings of the International Conference on Computer Vision.2011:2548-2555.
[6]RAGURAM R,FRAHM J M,POLLEFEYS M.A comparative analysis of ransac techniques leading to adaptive real-time random sample consensus[C]//Proceedings of the European Conference on Computer Vision.2008:500-513.
[7]BIAN J W,LIN W Y,MATSUSHITA Y,et al.Gms:grid-based motion statistics for fast,ultra-robust feature correspondence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2828-2837.
[8]DETONE D,MALISIEWICZ T,RABINOVICH A.Superpoint:self-supervised interest point detection and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:337-349.
[9]DUSMANU M,ROCCO I,PAJDLA T,et al.D2-net:a trainable cnn for joint description and detection of local features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:8084-8093.
[10]SARLIN P E,CADENA C,SIEGWART R,et al.From coarse to fine:robust hierarchical localization at large scale[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12716-12725.
[11]ZHANG J H,SUN D W,LUO Z X,et al.Learning two-viewcorrespondences and geometry using order-aware network[C]//Proceedings of the International Conference on Computer Vision.2019:5845-5854.
[12]SARLIN P E,DETONE D,MALISIEWICZ T,et al.Superglue:learning feature matching with graph neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:4938-4947.
[13]ROCCO I,ARANDJELOVIC R,SIVIC J.Efficient neighbour-hood consensus networks via submanifold sparse convolutions[C]//Proceedings of the European Conference on Computer Vision.2020:605-621.
[14]GRAHAM B,ENGELCKE M,VAN DER MAATEN L.3d semantic segmentation with submanifold sparse convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9224-9232.
[15]ROCCO I,CIMPOI M,ARANDJELOVIC R,et al.Neighbourhood consensus networks[C]//Proceedings of the Conference on Neural Information Processing Systems.2018:1651-1662.
[16]GRAVES A,FERNANDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2006:369-376.
[17]ZHANG Z,TANG Z M,WANG Y,et al.Dense Residual Network:Enhancing Global Dense Feature Flow for Character Reco-gnition[J].Neural Networks,2021,139:77-85.
[18]ZHANG Z,TANG Z M,ZHANG Z,et al.Fully-Convolutional Intensive Feature Flow Neural Network for Text Recognition[C]//Proceedings of the European Conference on Artificial Intelligence.2020:1706-1713.
[19]TANG Z M,JIANG W M,ZHANG Z,et al.DenseNet with Up-Sampling block for recognizing texts in images[J].Neural Computing and Applications,2019,32(11):7553-7561.
[20]LEE C Y,OSINDERO S.Recursive recurrent nets with attention modeling for ocr in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2231-2239.
[21]GAO Y Z,CHEN Y Y,WANG J Q,et al.Reading scene text with fully convolutional sequence modeling[J].Neurocompu-ting,2019,339:161-170.
[22]HU W Y,CAI X C,HOU J,et al.Gtc:guided training of ctc towards efficient and accurate scene text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11005-11012.
[23]WANG T W,ZHU Y Z,JIN L W,et al.Decoupled attention network for text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12216-12224.
[24]YUE X Y,KUANG Z H,LIN C H,et al.Robustscanner:dy-namically enhancing positional clues for robust text recognition[C]//Proceedings of the European Conference on Computer Vision.2020:135-151.
[25]WANG Y X,XIE H T,FANG S C,et al.From two to one:a new scene text recognizer with visual language modeling network[C]//Proceedings of the IEEE International Conference on Computer Vision.2021:14194-14203.
[26]BAEK J,KIM G,LEE J,et al.What is wrong with scene text recognition model comparisons? dataset and model analysis[C]//Proceedings of the International Conference on Computer Vision.2019:4715-4723.
[27]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the Conference on Neural Information Processing Systems.2017:5998-6008.
[28]CUTURI M.Sinkhorn distances:lightspeed computation of optimal transport[C]//Proceedings of the Conference on Neural Information Processing Systems.2013:2292-2300.
[29]MALMI E,KRAUSE S,ROTHE S,et al.Encode,tag,realize:high-precision text editing[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2019:5053-5064.
[30]DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of North American Chapter of the Association for Computational Linguistics.2018:4171-4186.
[31]SHI B G,YAO C,LIAO M H,et al.Icdar2017 competition on reading Chinese text in the wild (rctw-17)[C]//Proceedings of the International Conference on Document Analysis and Recognition.2017:1429-1434.
[32]ZHOU X Y,YAO C,WEN H,et al.East:an efficient and accurate scene text detector[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2642-2651.
[33]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324.
[34]SHI B G,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(11):2298-2304.
[35]KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.Icdar 2015 competition on robust reading[C]//Proceedings of the International Conference on Document Analysis and Recognition.2015:1156-1160.
[36]REVAUD J,DE SOUZA C,HUMENBERGER M,et al.R2d2:reliable and repeatable detector and descriptor[C]//Proceedings of the Conference on Neural Information Processing Systems.2019:12405-12415.

Related Articles 15

[1]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2]	ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[3]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4]	XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[5]	WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[6]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7]	ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[8]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[9]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[10]	ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[11]	ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[12]	XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[13]	MENG Yue-bo, MU Si-rong, LIU Guang-hui, XU Sheng-jun, HAN Jiu-qiang. Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism [J]. Computer Science, 2022, 49(7): 142-147.
[14]	JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[15]	XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Cross-image Text Reading Method Based on Text Line Matching

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0