基于文本三区域分割的场景文本检测方法

doi:10.11896/jsjkx.200800157

摘要/Abstract

摘要： 随着卷积神经网络的发展,场景文本检测也得到了快速发展。然而,场景文本检测仍然存在很多问题:一方面,许多检测方法都采用矩形框作为检测框,这对于图像中不规则的文本是不友好的;另一方面,部分方法获取的检测框无法分离相邻的文本实例,从而导致图像中相邻文本的误检测。为了解决这两个问题,文中提出了一种基于文本三区域分割的场景文本检测方法,将图像的文本实例分别映射到整体区域、核心区域和边框区域空间中,以获取图像的文本实例在上述3个区域的分割图,然后利用整体区域分割图和边框区域分割图来指导核心区域分割图的生成。文本的核心区域虽包含了图像中的文本位置、大小等信息,但是缺少边界信息。为了获取更加精确的检测结果,所提方法利用文本的边框区域来对核心区域进行监督学习。最后将基于文本的核心区域分割图像,产生契合文本核心的外接多边形,并进行一定比例的扩张,获取检测结果。实验结果表明,所提方法在ICDAR2015数据集上的准确率可达到83%,与现有的检测算法相比,其F值获得了1%以上的提升,而且该算法在弯曲文本的检测上亦有着优异的表现。

关键词: 场景文本检测, 计算机视觉, 深度学习, 神经网络, 实例分割

Abstract: Scene text detection has been developed rapidly with the development of convolutional neural network.However,there still exists some challenges.On the one hand,many detection algorithms use rectangular box as the detection box,which is inaccurate to locate the irregular texts.On the other hand,some methods may get the bounding boxes but fail to separate text instances that lie very close to each other,causing error detection.To solve these two problems,this paper proposes a novel triple segmentation (TS),text instances in image are mapped to score area,kernel area and threshold area,which generate three segmentation maps,the score map and threshold map are used to guide the generation of kernel map.Although kernel map has the information of texts in image,such as location,size and so on,it lacks the threshold information.In order to get a better result,this method uses threshold map to restrict the generation of kernel map.The detection result is based on instance segmentation to get the bounding polygon of text kernel instance,and then make an expansion.This algorithm achieves a precision of 83% on ICDAR2015 dataset,which outperforms the existing methods by more than 1% on F-measure,which proves this method is also effective to detect curve texts.

Key words: Computer vision, Deep learning, Instance segmentation, Neural networks, Scene text detection

中图分类号:

TP391

李煌, 王晓莉, 项欣光. 基于文本三区域分割的场景文本检测方法[J]. 计算机科学, 2020, 47(11): 142-147. https://doi.org/10.11896/jsjkx.200800157

LI Huang, WANG Xiao-li, XIANG Xin-guang. Scene Text Detection Based on Triple Segmentation[J]. Computer Science, 2020, 47(11): 142-147. https://doi.org/10.11896/jsjkx.200800157

参考文献

[1] LI Z C,TANG J H,ZHANG L Y,et al.Weakly-supervised Semantic Guided Hashing for Social Image Retrieval[J].International Journal of Computer Vision,2020,128(8):2265-2278.
[2] PENG Z,LI Z,ZHANG J,et al.Few-Shot Image Recognition With Knowledge Transfer[C]//International Conference on Computer Vision.2019:441-449.
[3] LI Z,TANG J,MEI T,et al.Deep Collaborative Embedding for Social Image Understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(9):2070-2083.
[4] ZHOU H,LI Z,NING C,et al.CAD:Scale Invariant Framework for Real-Time Object Detection[C]//International Conference on Computer Vision.2017:760-768.
[5] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[6] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//European Conference on Computer Vision.2016:21-37.
[7] REDMON J,DIVVALA S K,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Computer Vision and Pattern Recognition.2016:779-788.
[8] HE K,ZHANG X,REN S,et al.Deep Residual Learning for Ima-ge Recognition[C]//Computer Vision and Pattern Recognition.2016:770-778.
[9] LIN T,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//Computer Vision and Pattern Recognition.2017:936-944.
[10] DAI J,QI H,XIONG Y,et al.Deformable Convolutional Net-works[C]//International Conference on Computer Vision.2017:764-773.
[11] JAMIL A,SIDDIQI I,ARIF F,et al.Edge-Based Features for Localization of Artificial Urdu Text in Video Images[C]//2011 International Conference on Document Analysis and Recognition (ICDAR).IEEE,2011.
[12] SHI C,WANG C,XIAO B,et al.Scene text detection using graph model built upon maximally stable extremal regions[J].Pattern Recognition Letters,2013,34(2):107-116.
[13] EPSHTEIN B,OFEK E,WEXLER Y,et al.Detecting text in natural scenes with stroke width transform[C]//Computer Vision and Pattern Recognition.2010:2963-2970.
[14] LIAO M,SHI B,BAI X,et al.TextBoxes:a fast text detector with a single deep neural network[C]//National Conference on Artificial Intelligence.2017:4161-4167.
[15] ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//Computer Vision and Pattern Recognition.2017:2642-2651.
[16] LONG J,SHELHAMER E,DARRELL T,et al.Fully convolutional networks for semantic segmentation[C]//Computer Vision and Pattern Recognition.2015:3431-3440.
[17] TIAN Z,HUANG W,HE T,et al.Detecting Text in NaturalImage with Connectionist Text Proposal Network[C]//Euro-pean Conference on Computer Vision.2016:56-72.
[18] JIANG Y,ZHU X,WANG X,et al.R2CNN:Rotational Region CNN for Orientation Robust Scene Text Detection[J].arXiv:1706.09579,2017.
[19] HE P,HUANG W,HE T,et al.Single Shot Text Detector with Regional Attention[C]//International Conference on Computer Vision.2017:3066-3074.
[20] ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented Text Detection with Fully Convolutional Networks[C]//Computer Vision and Pattern Recognition.2016:4159-4167.
[21] YAO C,BAI X,SANG N,et al.Scene Text Detection via Holistic,Multi-Channel Prediction[J].arXiv:1606.09002,2016.
[22] DENG D,LIU H,CAI D,et al.PixelLink:Detecting Scene Text via Instance Segmentation[C]//National Conference on Artificial Intelligence.2018:6773-6780.
[23] WANG W,XIE E,LI X,et al.Shape Robust Text DetectionWith Progressive Scale Expansion Network[C]//Computer Vision and Pattern Recognition.2019:9336-9345.
[24] VATTI B R.A generic solution to polygon clipping[J].Communications of The ACM,1992,35(7):56-63.
[25] MILLETARI F,NAVAB N,AHMADI S,et al.V-Net:FullyConvolutional Neural Networks for Volumetric Medical Image Segmentation[C]//International Conference on 3D Vision.2016:565-571.
[26] SHRIVASTAVA A,GUPTA A,Girshick R,et al.Training Region-Based Object Detectors with Online Hard Example Mining[C]//Computer Vision and Pattern Recognition.2016:761-769.
[27] LIU Y,JIN L,ZHANG S,et al.Detecting Curve Text in the Wild:New Dataset and New Solution[J].arXiv:1712.02170,2017.
[28] KARATZAS D,GOMEZBIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on Robust Reading[C]//International Conference on Document Analysis and Recognition.2015:1156-1160.
[29] SHI B,BAI X,BELONGIE S,et al.Detecting Oriented Text in Natural Images by Linking Segments[C]//Computer Vision and Pattern Recognition.2017:3482-3490.
[30] LYU P,YAO C,WU W,et al.Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation[C]//Computer Vision and Pattern Recognition.2018:7553-7563.
[31] HE W,ZHANG X,YIN F,et al.Deep Direct Regression forMulti-oriented Scene Text Detection[C]//International Confe-rence on Computer Vision.2017:745-753.
[32] WANG W,XIE E,SONG X,et al.Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network[C]//International Conference on Computer Vision.2019:8440-8449.
[33] LONG S,RUAN J,ZHANG W,et al.TextSnake:A Flexible Representation for Detecting Text of Arbitrary Shapes[C]//European Conference on Computer Vision.2018:19-35.
[34] ZHANG C,LIANG B,HUANG Z,et al.Look More ThanOnce:An Accurate Detector for Text of Arbitrary Shapes[C]//Computer Vision and Pattern Recognition.2019:10552-10561.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[5]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[7]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[8]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11]	王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074
[12]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[13]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[14]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[15]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed