计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 243-248.doi: 10.11896/jsjkx.201000154

• 计算机图形学&多媒体 • 上一篇    下一篇

结合多粒度特征融合的自然场景文本检测方法

陈卓, 王国胤, 刘群   

  1. 重庆邮电大学计算智能重庆市重点实验室 重庆400065
  • 收稿日期:2020-10-26 修回日期:2021-04-03 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 刘群(liuqun@cqupt.edu.cn)
  • 作者简介:512619302@qq.com
  • 基金资助:
    国家自然科学重点基金项目(61936001)

Natural Scene Text Detection Algorithm Combining Multi-granularity Feature Fusion

CHEN Zhuo, WANG Guo-yin, LIU Qun   

  1. Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2020-10-26 Revised:2021-04-03 Online:2021-12-15 Published:2021-11-26
  • About author:CHEN Zhuo,born in 1993,master.His main research interests include compu-ter vision and so on.
    LIU Qun,born in 1969,Ph.D,professor,is a member of China Computer Federation.Her main research interests include data mining,complex network and so on.
  • Supported by:
    Key Program of National Natural Science Foundation of China(61936001).

摘要: 自然场景下的文本信息通常具有多样性和复杂性的特点。由于采用手工设计特征的方式,传统的自然场景文字检测方法缺乏鲁棒性,而已有的基于深度学习的文本检测方法在各层网络提取特征的过程中存在丢失重要特征信息的问题。文中从多粒度和认知学的角度,提出了一种结合多粒度特征融合的自然场景文本检测方法。该方法的主要贡献是通过对通用特征提取网络的不同粒度特征进行融合,并加入残差通道注意力机制,使得模型在充分学习图像中不同粒度特征信息的基础上,更加关注目标特征信息并抑制无用的信息,提升了模型的鲁棒性和准确率。实验结果表明,相比其他最新的方法,该方法在公开数据集上取得了85.3%的准确率和82.53%的F值,具有更好的性能。

关键词: 残差注意力, 多粒度信息, 卷积神经网络, 特征提取

Abstract: In natural scenes,text information usually has the characteristics of diversity and complexity.Due to the way of manua-lly designing features,traditional natural scene text detection methods lack robustness,and the existing text detection methods based on deep learning have the problem of losing important feature information in the process of extracting features in each layer of the network.This paper proposes a natural scene text detection method combined with multi-granularity feature fusion.The main contribution of this method is that by combining the features of different granularities in the general feature extraction network and adding the residual channel attention mechanism,the model can pay more attention to the target feature information and suppress useless information on the basis of fully learning the feature information of different granularities in the image,and this method improves the robustness and accuracy of the model.The experimental results show that,compared with other latest me-thods,the model has achieved 85.3% accuracy and 82.53% F-value on public datasets,and has better performance.

Key words: Convolutional neural network, Feature extraction, Multi-granularity information, Residual attention

中图分类号: 

  • TP391
[1]CHO H,SUNG M,JUN B.Canny Text Detector:Fast and Robust Scene Text Localization Algorithm[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:3566-3573.
[2]NEUMANN L,MATAS J.A method for text localization and recognition in real-world images[C]//Asian Conference on Computer Vision.Berlin:Springer Press,2010:770-783.
[3]TIAN S X,PAN Y F,HUANG C,et al.Text flow:A unified text detection system in natural scene images[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago:IEEE Press,2015:4651-4659.
[4]WANG K,BELONGIE S.Word spotting in the wild[C]//European Conference on Computer Vision.Berlin:Springer Press,2010:591-604.
[5]TIAN Z,HUANG W L,HE T,et al.Detecting text in natural image with connectionist text proposal network[C]//European Conference on Computer Vision.Cham:Springer Press,2016:56-72.
[6]SHI B G,BAI X,BELONGIE S.Detecting oriented text in natural images by linking segments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Hawaii:IEEE Press,2017:2550-2558.
[7]XU H L,SU F.A robust hierarchical detection method for scene text based on convolutional neural networks[C]//Proceedings of the 2015 IEEE International Conference on Multi-media and Expo.Turin:IEEE Press,2015:1-6.
[8]WANG Y X,XIE H T,ZHA Z J.ContourNet:Taking a Further Step Toward Accurate Arbitrary-shaped Scene Text Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE Press,2020:11753-11762.
[9]YANG X,HE D F,ZHOU Z H,et al.Learning to Read Irregular Text with Attention Mechanisms[C]//International Joint Conference on Artificial Intelligence Pacific Rim International Conference on Artificial Intelligence.Melbourne:Morgan Kaufmann Press,2017:3.
[10]WANG W H,XIE E Z,LI X,et al.Shape robust text detection with progressive scale expansion network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.California:IEEE Press,2019:9336-9345.
[11]BAEK Y,LEE B,HAN D,et al.Character region awareness for text detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.California:IEEE Press,2019:9365-9374.
[12]CHEN L.Topological structure in visual perception[J]. Science,1982,218(4573):699-700.
[13]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015) [2020-10-23].https://arxiv.org/pdf/1409.1556.pdf.
[14]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:770-778.
[15]ZHANG Y L,LI K P,LI K,et al.Image superresolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision.Munich:Springer Press,2018:286-301.
[16]YAO C,BAI X,SANG N,et al.Scene text detection via holistic,multi-channel prediction[EB/OL].(2016) [2020-10-23].https://arxiv.org/pdf/1606.09002.pdf.
[17]ZHANG Z,ZHANG C Q,SHEN W,et al.Multi-oriented text detection with fully convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:4159-4167.
[18]ZHENG Y,LI Q,LIU J,et al.A cascaded method for text detection in natural scene images[J].Neurocomputing,2017,238:307-315.
[19]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]//European Conference on Computer Vision.Cham:Springer Press,2016:21-37.
[20]MA J Q,SHAO W Y,YE H,et al.Arbitrary-oriented scene text detection via rotation proposals[J].IEEE Transactions on Multimedia,2018,20(11):3111-3122.
[21]ZHOU X Y,YAO C,WEN H,et al.East:an efficient and accurate scene text detector[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.Hawaii:IEEE Press,2017:5551-5560.
[22]REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:to- wards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence Press,2016,39(6):1137-1149.
[23]SHI C Z,WANG C H,XIAO B H,et al.Scene text detection using graph model built upon maxially stable extremal regions[J].Pattern Recognition Letters,2013,34(2):107-116.
[24]WANG X B,SONG Y H,ZHANG Y L,et al.Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis[J].Pattern Recognition Letters,2015,60:41-47.
[25]JADERBERG M,VEDALDI A,ZISSERMAN A.Deep features or text spotting[C]//European Conference on Computer Vision.Zurich:Springer Press,2014:512-528.
[26]YIN X C,PEI W Y,ZHANG J,et al.Multi-orientation scene text detection with adaptive clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence Press,2015,37(9):1930-1937.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[8] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[9] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[10] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[11] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[12] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[13] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[15] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!