计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 139-145.doi: 10.11896/jsjkx.220600032
戴禹, 许林峰
DAI Yu, XU Lin-feng
摘要: 通过摄像头阅读文本可帮助计算机理解文本内容。然而,由于摄像头视野的局限性和中文文本识别的复杂性,计算机有时很难通过摄像头从单张文本图像获取完整的文本内容,因此定义了跨图文本阅读任务,旨在从一对具有重叠区域的文本图像中获取完整的文本内容。针对跨图文本阅读任务,提出了基于文本行匹配的跨图文本阅读方法。首先采用文本检测网络来裁剪文本行,然后设计了基于多头自注意力机制的文本行匹配网络来预测文本行的匹配关系,最后提出了基于编辑的文本阅读网络,以去除重叠文本并读取文本内容。为了训练和评估跨图文本阅读方法,构造了跨图中文文本阅读数据集(Cross-image Chinese Text Reading Dataset,CCTR)。在CCTR数据集上进行实验,结果表明,相比像素级拼接和识别方法,所提方法能够得到更高的阅读性能,验证了其优越性。
中图分类号:
[1]LIAO M H,WAN Z Y,YAO C,et al.Real-time scene text detection with differentiable binarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11474-11481. [2]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110. [3]BAY H,TUYTELAARS T,GOOL L V.Surf:speeded up ro-bust features[C]//Proceedings of the European Conference on Computer Vision.2006:404-417. [4]RUBLEE E,RABAUD V,KONOLIGE K,et al.Orb:an effi-cient alternative to sift or surf[C]//Proceedings of the International Conference on Computer Vision.2011:2564-2571. [5]LEUTENEGGER S,CHLI M,SIEGWART R Y.Brisk:binary robust invariant scalable keypoints[C]//Proceedings of the International Conference on Computer Vision.2011:2548-2555. [6]RAGURAM R,FRAHM J M,POLLEFEYS M.A comparative analysis of ransac techniques leading to adaptive real-time random sample consensus[C]//Proceedings of the European Conference on Computer Vision.2008:500-513. [7]BIAN J W,LIN W Y,MATSUSHITA Y,et al.Gms:grid-based motion statistics for fast,ultra-robust feature correspondence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2828-2837. [8]DETONE D,MALISIEWICZ T,RABINOVICH A.Superpoint:self-supervised interest point detection and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:337-349. [9]DUSMANU M,ROCCO I,PAJDLA T,et al.D2-net:a trainable cnn for joint description and detection of local features[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:8084-8093. [10]SARLIN P E,CADENA C,SIEGWART R,et al.From coarse to fine:robust hierarchical localization at large scale[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12716-12725. [11]ZHANG J H,SUN D W,LUO Z X,et al.Learning two-viewcorrespondences and geometry using order-aware network[C]//Proceedings of the International Conference on Computer Vision.2019:5845-5854. [12]SARLIN P E,DETONE D,MALISIEWICZ T,et al.Superglue:learning feature matching with graph neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:4938-4947. [13]ROCCO I,ARANDJELOVIC R,SIVIC J.Efficient neighbour-hood consensus networks via submanifold sparse convolutions[C]//Proceedings of the European Conference on Computer Vision.2020:605-621. [14]GRAHAM B,ENGELCKE M,VAN DER MAATEN L.3d semantic segmentation with submanifold sparse convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9224-9232. [15]ROCCO I,CIMPOI M,ARANDJELOVIC R,et al.Neighbourhood consensus networks[C]//Proceedings of the Conference on Neural Information Processing Systems.2018:1651-1662. [16]GRAVES A,FERNANDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2006:369-376. [17]ZHANG Z,TANG Z M,WANG Y,et al.Dense Residual Network:Enhancing Global Dense Feature Flow for Character Reco-gnition[J].Neural Networks,2021,139:77-85. [18]ZHANG Z,TANG Z M,ZHANG Z,et al.Fully-Convolutional Intensive Feature Flow Neural Network for Text Recognition[C]//Proceedings of the European Conference on Artificial Intelligence.2020:1706-1713. [19]TANG Z M,JIANG W M,ZHANG Z,et al.DenseNet with Up-Sampling block for recognizing texts in images[J].Neural Computing and Applications,2019,32(11):7553-7561. [20]LEE C Y,OSINDERO S.Recursive recurrent nets with attention modeling for ocr in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2231-2239. [21]GAO Y Z,CHEN Y Y,WANG J Q,et al.Reading scene text with fully convolutional sequence modeling[J].Neurocompu-ting,2019,339:161-170. [22]HU W Y,CAI X C,HOU J,et al.Gtc:guided training of ctc towards efficient and accurate scene text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11005-11012. [23]WANG T W,ZHU Y Z,JIN L W,et al.Decoupled attention network for text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12216-12224. [24]YUE X Y,KUANG Z H,LIN C H,et al.Robustscanner:dy-namically enhancing positional clues for robust text recognition[C]//Proceedings of the European Conference on Computer Vision.2020:135-151. [25]WANG Y X,XIE H T,FANG S C,et al.From two to one:a new scene text recognizer with visual language modeling network[C]//Proceedings of the IEEE International Conference on Computer Vision.2021:14194-14203. [26]BAEK J,KIM G,LEE J,et al.What is wrong with scene text recognition model comparisons? dataset and model analysis[C]//Proceedings of the International Conference on Computer Vision.2019:4715-4723. [27]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the Conference on Neural Information Processing Systems.2017:5998-6008. [28]CUTURI M.Sinkhorn distances:lightspeed computation of optimal transport[C]//Proceedings of the Conference on Neural Information Processing Systems.2013:2292-2300. [29]MALMI E,KRAUSE S,ROTHE S,et al.Encode,tag,realize:high-precision text editing[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2019:5053-5064. [30]DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of North American Chapter of the Association for Computational Linguistics.2018:4171-4186. [31]SHI B G,YAO C,LIAO M H,et al.Icdar2017 competition on reading Chinese text in the wild (rctw-17)[C]//Proceedings of the International Conference on Document Analysis and Recognition.2017:1429-1434. [32]ZHOU X Y,YAO C,WEN H,et al.East:an efficient and accurate scene text detector[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2642-2651. [33]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324. [34]SHI B G,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(11):2298-2304. [35]KARATZAS D,GOMEZ-BIGORDA L,NICOLAOU A,et al.Icdar 2015 competition on robust reading[C]//Proceedings of the International Conference on Document Analysis and Recognition.2015:1156-1160. [36]REVAUD J,DE SOUZA C,HUMENBERGER M,et al.R2d2:reliable and repeatable detector and descriptor[C]//Proceedings of the Conference on Neural Information Processing Systems.2019:12405-12415. |
[1] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[2] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[3] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[4] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[5] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[6] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[7] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[8] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[9] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[10] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[11] | 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075 |
[12] | 彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093 |
[13] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[14] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[15] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
|