计算机科学 ›› 2020, Vol. 47 ›› Issue (7): 135-140.doi: 10.11896/jsjkx.190600157
刘燕, 温静
LIU Yan, WEN Jing
摘要: 传统的文本检测方法大多采用自下而上的流程,它们通常从低级语义字符或笔画检测开始,然后进行非文本组件过滤、文本行构建和文本行验证。复杂场景中文字的造型、尺度、排版以及周围环境的剧烈变化,导致人的视觉系统是在不同的视觉粒度下完成文本检测任务的,而这些自底向上的传统方法的性能很大程度上依赖于低级特征的检测,难以鲁棒地适应不同粒度下的文本特征。近年来,深度学习方法被应用于文本检测中来保留不同分辨率下的文本特征,但已有的方法在对网络中各层特征提取的过程中没有明确重点特征信息,在各层之间的特征映射中会有信息丢失,造成一些非文本目标被误判,使得检测过程不仅耗时,而且会产生大量误检和漏检。为此,提出一种基于注意力机制的复杂场景文本检测方法,该方法的主要贡献是在VGG16中引入了视觉注意层,在细粒度下利用注意力机制增强网络内全局信息中的显著信息。实验表明,在载有GPU的Ubuntu环境下,该方法在复杂场景文本图片的检测中能保证文本区域的完整性,减少检测区域的碎片化,同时能获得高达87%的查全率和89%的查准率。
中图分类号:
[1]GUPTA A,VEDALDI A,ZISSERMAN A.Synthetic data for text localisation in natural images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2315-2324. [2]HUANG W,QIAO Y,TANG X.Robust scene text detectionwith convolutional neural networks induced mser trees[C]//European Conference on Computer Vision (ECCV).2014:3. [3]TIAN S,PAN Y,HUANG C,et al.Text flow:A unified textdetection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4651-4659. [4]YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scenetext detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1930-1937. [5]EPSHTEIN B,OFEK E,WEXLER Y.Detecting text in natural scenes with stroke width transform[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:2963-2970. [6]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99. [7]HE W,ZHANG X Y,YIN F,et al.Deep direct regression for multi-oriented scene text detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:745-753. [8]LIU Y,JIN L.Deep matching prior network:Toward tightermulti-oriented text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1962-1969. [9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [10]GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610. [11]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19. [12]HE T,HUANG W,QIAO Y,et al.Text-attentional convolutional neural network for scene text detection[J].IEEE Transactions on Image Processing,2016,25(6):2529-2541. [13]ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision.Cham:Springer.2014:818-833. [14]TIAN Z,HUANG W,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer,2016:56-72. [15]ZHOU X,YAO C,WEN H,et al.EAST:an efficient and accurate scene text detector[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2017:5551-5560. [16]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Cham:Springer,2016:21-37. [17]ICDAR 2013 robust reading competition challenge 2 results[OL].http://dag.cvc.uab.es/icdar2013. [18]BAI B,YIN F,LIU C L.Scene text localization using gradient local correlation[C]//2013 12th International Conference on Document Analysis and Recognition.IEEE,2013:1380-1384. [19]YIN X C,YIN X,HUANG K,et al.Robust text detection innatural scene images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(5):970-983. [20]ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented Text Detection with Fully Convolutional Networks[C]//Computer Vision and Pattern Recognition.2016:4159-4167. [21]YAO C,BAI X,SANG N,et al.Scene Text Detection via Holistic,Multi-Channel Prediction[J].arXiv:1606.09002,2016. [22]LIU X,LIANG D,YAN S,et al.FOTS:Fast Oriented TextSpotting with a Unified Network[C]//Computer Vision and Pattern Recognition.2018:5676-5685. [23]LI Y,YU Y,LI Z,et al.Pixel-Anchor:A Fast Oriented Scene Text Detector with Combined Networks[J].arXiv:1811.07432,2018. [24]BAEK Y,LEE B,HAN D,et al.Character Region Awareness for Text Detection[C]//Computer Vision and Pattern Recognition.2019:9365-9374. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[4] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[5] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[6] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[7] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[8] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[9] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[10] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[11] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[12] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[13] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[14] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[15] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
|