计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 243-248.doi: 10.11896/jsjkx.201000154

• 计算机图形学&多媒体 • 上一篇    下一篇

结合多粒度特征融合的自然场景文本检测方法

陈卓, 王国胤, 刘群   

  1. 重庆邮电大学计算智能重庆市重点实验室 重庆400065
  • 收稿日期:2020-10-26 修回日期:2021-04-03 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 刘群(liuqun@cqupt.edu.cn)
  • 作者简介:512619302@qq.com
  • 基金资助:
    国家自然科学重点基金项目(61936001)

Natural Scene Text Detection Algorithm Combining Multi-granularity Feature Fusion

CHEN Zhuo, WANG Guo-yin, LIU Qun   

  1. Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2020-10-26 Revised:2021-04-03 Online:2021-12-15 Published:2021-11-26
  • About author:CHEN Zhuo,born in 1993,master.His main research interests include compu-ter vision and so on.
    LIU Qun,born in 1969,Ph.D,professor,is a member of China Computer Federation.Her main research interests include data mining,complex network and so on.
  • Supported by:
    Key Program of National Natural Science Foundation of China(61936001).

摘要: 自然场景下的文本信息通常具有多样性和复杂性的特点。由于采用手工设计特征的方式,传统的自然场景文字检测方法缺乏鲁棒性,而已有的基于深度学习的文本检测方法在各层网络提取特征的过程中存在丢失重要特征信息的问题。文中从多粒度和认知学的角度,提出了一种结合多粒度特征融合的自然场景文本检测方法。该方法的主要贡献是通过对通用特征提取网络的不同粒度特征进行融合,并加入残差通道注意力机制,使得模型在充分学习图像中不同粒度特征信息的基础上,更加关注目标特征信息并抑制无用的信息,提升了模型的鲁棒性和准确率。实验结果表明,相比其他最新的方法,该方法在公开数据集上取得了85.3%的准确率和82.53%的F值,具有更好的性能。

关键词: 特征提取, 多粒度信息, 残差注意力, 卷积神经网络

Abstract: In natural scenes,text information usually has the characteristics of diversity and complexity.Due to the way of manua-lly designing features,traditional natural scene text detection methods lack robustness,and the existing text detection methods based on deep learning have the problem of losing important feature information in the process of extracting features in each layer of the network.This paper proposes a natural scene text detection method combined with multi-granularity feature fusion.The main contribution of this method is that by combining the features of different granularities in the general feature extraction network and adding the residual channel attention mechanism,the model can pay more attention to the target feature information and suppress useless information on the basis of fully learning the feature information of different granularities in the image,and this method improves the robustness and accuracy of the model.The experimental results show that,compared with other latest me-thods,the model has achieved 85.3% accuracy and 82.53% F-value on public datasets,and has better performance.

Key words: Feature extraction, Multi-granularity information, Residual attention, Convolutional neural network

中图分类号: 

  • TP391
[1]CHO H,SUNG M,JUN B.Canny Text Detector:Fast and Robust Scene Text Localization Algorithm[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:3566-3573.
[2]NEUMANN L,MATAS J.A method for text localization and recognition in real-world images[C]//Asian Conference on Computer Vision.Berlin:Springer Press,2010:770-783.
[3]TIAN S X,PAN Y F,HUANG C,et al.Text flow:A unified text detection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago:IEEE Press,2015:4651-4659.
[4]WANG K,BELONGIE S.Word spotting in the wild[C]//European Conference on Computer Vision.Berlin:Springer Press,2010:591-604.
[5]TIAN Z,HUANG W L,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer Press,2016:56-72.
[6]SHI B G,BAI X,BELONGIE S.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE Press,2017:2550-2558.
[7]XU H L,SU F.A robust hierarchical detection method for scene text based on convolutional neural networks[C]//Proceedings of the 2015 IEEE International Conference on Multi-media and Expo.Turin:IEEE Press,2015:1-6.
[8]WANG Y X,XIE H T,ZHA Z J.ContourNet:Taking a Further Step Toward Accurate Arbitrary-shaped Scene Text Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE Press,2020:11753-11762.
[9]YANG X,HE D F,ZHOU Z H,et al.Learning to Read Irregular Text with Attention Mechanisms[C]//International Joint Conference on Artificial Intelligence Pacific Rim International Conference on Artificial Intelligence.Melbourne:Morgan Kaufmann Press,2017:3.
[10]WANG W H,XIE E Z,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.California:IEEE Press,2019:9336-9345.
[11]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.California:IEEE Press,2019:9365-9374.
[12]CHEN L.Topological structure in visual perception[J]. Science,1982,218(4573):699-700.
[13]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015) [2020-10-23].https://arxiv.org/pdf/1409.1556.pdf.
[14]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:770-778.
[15]ZHANG Y L,LI K P,LI K,et al.Image superresolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision.Munich:Springer Press,2018:286-301.
[16]YAO C,BAI X,SANG N,et al.Scene text detection via holistic,multi-channel prediction[EB/OL].(2016) [2020-10-23].https://arxiv.org/pdf/1606.09002.pdf.
[17]ZHANG Z,ZHANG C Q,SHEN W,et al.Multi-oriented text detection with fully convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:4159-4167.
[18]ZHENG Y,LI Q,LIU J,et al.A cascaded method for text detection in natural scene images[J].Neurocomputing,2017,238:307-315.
[19]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]//European Conference on Computer Vision.Cham:Springer Press,2016:21-37.
[20]MA J Q,SHAO W Y,YE H,et al.Arbitrary-oriented scene text detection via rotation proposals[J].IEEE Transactions on Multimedia,2018,20(11):3111-3122.
[21]ZHOU X Y,YAO C,WEN H,et al.East:an efficient and accurate scene text detector[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Hawaii:IEEE Press,2017:5551-5560.
[22]REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:to- wards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence Press,2016,39(6):1137-1149.
[23]SHI C Z,WANG C H,XIAO B H,et al.Scene text detection using graph model built upon maxially stable extremal regions[J].Pattern Recognition Letters,2013,34(2):107-116.
[24]WANG X B,SONG Y H,ZHANG Y L,et al.Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis[J].Pattern Recognition Letters,2015,60:41-47.
[25]JADERBERG M,VEDALDI A,ZISSERMAN A.Deep features or text spotting[C]//European Conference on Computer Vision.Zurich:Springer Press,2014:512-528.
[26]YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scene text detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence Press,2015,37(9):1930-1937.
[1] 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法[J]. 计算机科学, 2021, 48(9): 77-85.
[2] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[3] 张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法[J]. 计算机科学, 2021, 48(9): 345-351.
[4] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13-23.
[5] 周文辉, 石敏, 朱登明, 周军. 基于残差注意力网络的地震数据超分辨率方法[J]. 计算机科学, 2021, 48(8): 24-31.
[6] 王乐, 杨晓敏. 基于感知损失的遥感图像全色锐化反馈网络[J]. 计算机科学, 2021, 48(8): 91-98.
[7] 王炽, 常俊. 基于3D卷积神经网络的CSI跨场景手势识别方法[J]. 计算机科学, 2021, 48(8): 322-327.
[8] 暴雨轩, 芦天亮, 杜彦辉, 石达. 基于i_ResNet34模型和数据增强的深度伪造视频检测方法[J]. 计算机科学, 2021, 48(7): 77-85.
[9] 程松盛, 潘金山. 基于深度学习特征匹配的视频超分辨率方法[J]. 计算机科学, 2021, 48(7): 184-189.
[10] 王栋, 周大可, 黄有达, 杨欣. 基于多尺度多粒度特征的行人重识别[J]. 计算机科学, 2021, 48(7): 238-244.
[11] 张丽倩, 李孟航, 高珊珊, 张彩明. 面向计算机辅助舌诊关键问题的解决方案综述[J]. 计算机科学, 2021, 48(7): 256-269.
[12] 熊朝阳, 王婷. 基于卷积神经网络的建筑构件图像识别[J]. 计算机科学, 2021, 48(6A): 51-56.
[13] 胡京徽, 许鹏. 一种基于图像分类的航空紧固件产品自动分类方法[J]. 计算机科学, 2021, 48(6A): 63-66.
[14] 和青芳, 王慧, 程光. 自适应小数据集乳腺癌病理组织分类研究[J]. 计算机科学, 2021, 48(6A): 67-73.
[15] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法[J]. 计算机科学, 2021, 48(6A): 95-100.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李思瑶, 周海芳, 方民权. 基于GPU的图像监督分类算法的研究[J]. 计算机科学, 2018, 45(6A): 143 -145 .
[2] 王焕文,徐晓刚,徐冠雷,王孝通. 基于阴影不一致的简易人像篡改鉴别[J]. 计算机科学, 2014, 41(Z6): 129 -131 .
[3] 梁俊斌, 马方强, 蒋婵. 动态无线传感网中数据查询技术的研究进展[J]. 计算机科学, 2019, 46(11): 41 -48 .
[4] 李婷婷, 毕海权, 王宏林, 王晓亮, 周远龙. 基于BP神经网络的地铁站厅空调负荷预测[J]. 计算机科学, 2019, 46(11A): 590 -594 .
[5] 陈俊芬,张明,赵佳成. 复杂高维数据的密度峰值快速搜索聚类算法[J]. 计算机科学, 2020, 47(3): 79 -86 .
[6] 陈晋音, 邹健飞, 袁俊坤, 叶林辉. 面向恶意软件检测模型的黑盒对抗攻击方法[J]. 计算机科学, 2021, 48(5): 60 -67 .
[7] 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化[J]. 计算机科学, 2021, 48(6): 34 -40 .
[8] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13 -23 .
[9] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[10] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .