Computer Science ›› 2020, Vol. 47 ›› Issue (11): 142-147.doi: 10.11896/jsjkx.200800157

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Scene Text Detection Based on Triple Segmentation

LI Huang, WANG Xiao-li, XIANG Xin-guang   

  1. School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2020-08-24 Revised:2020-09-16 Online:2020-11-15 Published:2020-11-05
  • About author:LI Huang,born in 1992,master.His main research interests include text detection and so on.
    XIANG Xin-guang,born in 1982,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include multimedia analysis,computer vision and so on.

Abstract: Scene text detection has been developed rapidly with the development of convolutional neural network.However,there still exists some challenges.On the one hand,many detection algorithms use rectangular box as the detection box,which is inaccurate to locate the irregular texts.On the other hand,some methods may get the bounding boxes but fail to separate text instances that lie very close to each other,causing error detection.To solve these two problems,this paper proposes a novel triple segmentation (TS),text instances in image are mapped to score area,kernel area and threshold area,which generate three segmentation maps,the score map and threshold map are used to guide the generation of kernel map.Although kernel map has the information of texts in image,such as location,size and so on,it lacks the threshold information.In order to get a better result,this method uses threshold map to restrict the generation of kernel map.The detection result is based on instance segmentation to get the bounding polygon of text kernel instance,and then make an expansion.This algorithm achieves a precision of 83% on ICDAR2015 dataset,which outperforms the existing methods by more than 1% on F-measure,which proves this method is also effective to detect curve texts.

Key words: Computer vision, Deep learning, Instance segmentation, Neural networks, Scene text detection

CLC Number: 

  • TP391
[1] LI Z C,TANG J H,ZHANG L Y,et al.Weakly-supervised Semantic Guided Hashing for Social Image Retrieval[J].International Journal of Computer Vision,2020,128(8):2265-2278.
[2] PENG Z,LI Z,ZHANG J,et al.Few-Shot Image Recognition With Knowledge Transfer[C]//International Conference on Computer Vision.2019:441-449.
[3] LI Z,TANG J,MEI T,et al.Deep Collaborative Embedding for Social Image Understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(9):2070-2083.
[4] ZHOU H,LI Z,NING C,et al.CAD:Scale Invariant Framework for Real-Time Object Detection[C]//International Conference on Computer Vision.2017:760-768.
[5] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[6] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//European Conference on Computer Vision.2016:21-37.
[7] REDMON J,DIVVALA S K,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Computer Vision and Pattern Recognition.2016:779-788.
[8] HE K,ZHANG X,REN S,et al.Deep Residual Learning for Ima-ge Recognition[C]//Computer Vision and Pattern Recognition.2016:770-778.
[9] LIN T,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//Computer Vision and Pattern Recognition.2017:936-944.
[10] DAI J,QI H,XIONG Y,et al.Deformable Convolutional Net-works[C]//International Conference on Computer Vision.2017:764-773.
[11] JAMIL A,SIDDIQI I,ARIF F,et al.Edge-Based Features for Localization of Artificial Urdu Text in Video Images[C]//2011 International Conference on Document Analysis and Recognition (ICDAR).IEEE,2011.
[12] SHI C,WANG C,XIAO B,et al.Scene text detection using graph model built upon maximally stable extremal regions[J].Pattern Recognition Letters,2013,34(2):107-116.
[13] EPSHTEIN B,OFEK E,WEXLER Y,et al.Detecting text in natural scenes with stroke width transform[C]//Computer Vision and Pattern Recognition.2010:2963-2970.
[14] LIAO M,SHI B,BAI X,et al.TextBoxes:a fast text detector with a single deep neural network[C]//National Conference on Artificial Intelligence.2017:4161-4167.
[15] ZHOU X,YAO C,WEN H,et al.EAST:An Efficient and Accurate Scene Text Detector[C]//Computer Vision and Pattern Recognition.2017:2642-2651.
[16] LONG J,SHELHAMER E,DARRELL T,et al.Fully convolutional networks for semantic segmentation[C]//Computer Vision and Pattern Recognition.2015:3431-3440.
[17] TIAN Z,HUANG W,HE T,et al.Detecting Text in NaturalImage with Connectionist Text Proposal Network[C]//Euro-pean Conference on Computer Vision.2016:56-72.
[18] JIANG Y,ZHU X,WANG X,et al.R2CNN:Rotational Region CNN for Orientation Robust Scene Text Detection[J].arXiv:1706.09579,2017.
[19] HE P,HUANG W,HE T,et al.Single Shot Text Detector with Regional Attention[C]//International Conference on Computer Vision.2017:3066-3074.
[20] ZHANG Z,ZHANG C,SHEN W,et al.Multi-oriented Text Detection with Fully Convolutional Networks[C]//Computer Vision and Pattern Recognition.2016:4159-4167.
[21] YAO C,BAI X,SANG N,et al.Scene Text Detection via Holistic,Multi-Channel Prediction[J].arXiv:1606.09002,2016.
[22] DENG D,LIU H,CAI D,et al.PixelLink:Detecting Scene Text via Instance Segmentation[C]//National Conference on Artificial Intelligence.2018:6773-6780.
[23] WANG W,XIE E,LI X,et al.Shape Robust Text DetectionWith Progressive Scale Expansion Network[C]//Computer Vision and Pattern Recognition.2019:9336-9345.
[24] VATTI B R.A generic solution to polygon clipping[J].Communications of The ACM,1992,35(7):56-63.
[25] MILLETARI F,NAVAB N,AHMADI S,et al.V-Net:FullyConvolutional Neural Networks for Volumetric Medical Image Segmentation[C]//International Conference on 3D Vision.2016:565-571.
[26] SHRIVASTAVA A,GUPTA A,Girshick R,et al.Training Region-Based Object Detectors with Online Hard Example Mining[C]//Computer Vision and Pattern Recognition.2016:761-769.
[27] LIU Y,JIN L,ZHANG S,et al.Detecting Curve Text in the Wild:New Dataset and New Solution[J].arXiv:1712.02170,2017.
[28] KARATZAS D,GOMEZBIGORDA L,NICOLAOU A,et al.ICDAR 2015 competition on Robust Reading[C]//International Conference on Document Analysis and Recognition.2015:1156-1160.
[29] SHI B,BAI X,BELONGIE S,et al.Detecting Oriented Text in Natural Images by Linking Segments[C]//Computer Vision and Pattern Recognition.2017:3482-3490.
[30] LYU P,YAO C,WU W,et al.Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation[C]//Computer Vision and Pattern Recognition.2018:7553-7563.
[31] HE W,ZHANG X,YIN F,et al.Deep Direct Regression forMulti-oriented Scene Text Detection[C]//International Confe-rence on Computer Vision.2017:745-753.
[32] WANG W,XIE E,SONG X,et al.Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network[C]//International Conference on Computer Vision.2019:8440-8449.
[33] LONG S,RUAN J,ZHANG W,et al.TextSnake:A Flexible Representation for Detecting Text of Arbitrary Shapes[C]//European Conference on Computer Vision.2018:19-35.
[34] ZHANG C,LIANG B,HUANG Z,et al.Look More ThanOnce:An Accurate Detector for Text of Arbitrary Shapes[C]//Computer Vision and Pattern Recognition.2019:10552-10561.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[4] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[5] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[6] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[7] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[8] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[9] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[10] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[11] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[12] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[13] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[14] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[15] ZHU Wen-tao, LAN Xian-chao, LUO Huan-lin, YUE Bing, WANG Yang. Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN [J]. Computer Science, 2022, 49(6A): 378-383.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!