计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 243-248.doi: 10.11896/jsjkx.201000154
陈卓, 王国胤, 刘群
CHEN Zhuo, WANG Guo-yin, LIU Qun
摘要: 自然场景下的文本信息通常具有多样性和复杂性的特点。由于采用手工设计特征的方式,传统的自然场景文字检测方法缺乏鲁棒性,而已有的基于深度学习的文本检测方法在各层网络提取特征的过程中存在丢失重要特征信息的问题。文中从多粒度和认知学的角度,提出了一种结合多粒度特征融合的自然场景文本检测方法。该方法的主要贡献是通过对通用特征提取网络的不同粒度特征进行融合,并加入残差通道注意力机制,使得模型在充分学习图像中不同粒度特征信息的基础上,更加关注目标特征信息并抑制无用的信息,提升了模型的鲁棒性和准确率。实验结果表明,相比其他最新的方法,该方法在公开数据集上取得了85.3%的准确率和82.53%的F值,具有更好的性能。
中图分类号:
[1]CHO H,SUNG M,JUN B.Canny Text Detector:Fast and Robust Scene Text Localization Algorithm[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:3566-3573. [2]NEUMANN L,MATAS J.A method for text localization and recognition in real-world images[C]//Asian Conference on Computer Vision.Berlin:Springer Press,2010:770-783. [3]TIAN S X,PAN Y F,HUANG C,et al.Text flow:A unified text detection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago:IEEE Press,2015:4651-4659. [4]WANG K,BELONGIE S.Word spotting in the wild[C]//European Conference on Computer Vision.Berlin:Springer Press,2010:591-604. [5]TIAN Z,HUANG W L,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer Press,2016:56-72. [6]SHI B G,BAI X,BELONGIE S.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE Press,2017:2550-2558. [7]XU H L,SU F.A robust hierarchical detection method for scene text based on convolutional neural networks[C]//Proceedings of the 2015 IEEE International Conference on Multi-media and Expo.Turin:IEEE Press,2015:1-6. [8]WANG Y X,XIE H T,ZHA Z J.ContourNet:Taking a Further Step Toward Accurate Arbitrary-shaped Scene Text Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE Press,2020:11753-11762. [9]YANG X,HE D F,ZHOU Z H,et al.Learning to Read Irregular Text with Attention Mechanisms[C]//International Joint Conference on Artificial Intelligence Pacific Rim International Conference on Artificial Intelligence.Melbourne:Morgan Kaufmann Press,2017:3. [10]WANG W H,XIE E Z,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.California:IEEE Press,2019:9336-9345. [11]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.California:IEEE Press,2019:9365-9374. [12]CHEN L.Topological structure in visual perception[J]. Science,1982,218(4573):699-700. [13]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015) [2020-10-23].https://arxiv.org/pdf/1409.1556.pdf. [14]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:770-778. [15]ZHANG Y L,LI K P,LI K,et al.Image superresolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision.Munich:Springer Press,2018:286-301. [16]YAO C,BAI X,SANG N,et al.Scene text detection via holistic,multi-channel prediction[EB/OL].(2016) [2020-10-23].https://arxiv.org/pdf/1606.09002.pdf. [17]ZHANG Z,ZHANG C Q,SHEN W,et al.Multi-oriented text detection with fully convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:4159-4167. [18]ZHENG Y,LI Q,LIU J,et al.A cascaded method for text detection in natural scene images[J].Neurocomputing,2017,238:307-315. [19]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]//European Conference on Computer Vision.Cham:Springer Press,2016:21-37. [20]MA J Q,SHAO W Y,YE H,et al.Arbitrary-oriented scene text detection via rotation proposals[J].IEEE Transactions on Multimedia,2018,20(11):3111-3122. [21]ZHOU X Y,YAO C,WEN H,et al.East:an efficient and accurate scene text detector[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Hawaii:IEEE Press,2017:5551-5560. [22]REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:to- wards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence Press,2016,39(6):1137-1149. [23]SHI C Z,WANG C H,XIAO B H,et al.Scene text detection using graph model built upon maxially stable extremal regions[J].Pattern Recognition Letters,2013,34(2):107-116. [24]WANG X B,SONG Y H,ZHANG Y L,et al.Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis[J].Pattern Recognition Letters,2015,60:41-47. [25]JADERBERG M,VEDALDI A,ZISSERMAN A.Deep features or text spotting[C]//European Conference on Computer Vision.Zurich:Springer Press,2014:512-528. [26]YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scene text detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence Press,2015,37(9):1930-1937. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[7] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[8] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[9] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[10] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[11] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[12] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[13] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[14] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[15] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
|