计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 8-18.doi: 10.11896/j.issn.1002-137X.2017.11A.002

• 综述研究 • 上一篇    下一篇

视频和图像文本提取方法综述

蒋梦迪,程江华,陈明辉,库锡树   

  1. 国防科学技术大学电子科学与工程学院 长沙410073,国防科学技术大学电子科学与工程学院 长沙410073,火箭军驻长沙地区军事代表室 长沙410073,国防科学技术大学电子科学与工程学院 长沙410073
  • 出版日期:2018-12-01 发布日期:2018-12-01

Text Extraction in Video and Images:A Review

JIANG Meng-di, CHENG Jiang-hua, CHEN Ming-hui and KU Xi-shu   

  • Online:2018-12-01 Published:2018-12-01

摘要: 文本提取在视频和图像中具有重要的应用价值。近年来,大数据时代带来了海量信息检索的迫切需求,大量视频和图像中文本的提取方法涌现出来。回顾了视频和图像中文本提取的算法,从文本提取流程出发,将其分为文本区域检测定位和文本分割两大步骤。在每个步骤中,分析并比较了现有算法的使用范围及相对优缺点,讨论了图像公用数据库,列举了近些年来图像中文本提取的重要应用,指出了当前研究中存在的问题,展望了视频和场景图像文本提取方法的发展趋势。

关键词: 视频和图像,文本提取,文本区域检测与定位,文本分割,综述

Abstract: Text extraction in video and images has important application value.Big data era brought urgent demands of huge amounts of information retrieval,many text extraction methods have been proposed in recent years.In this paper,we reviewed text extraction methods from video and images.First,we classified the course of text extraction into two steps:text region detection and localization,text segmentation.Then,some text region detection and localization and text segmentation algorithms have been discussed regarding their application fields and their advantages and disadvantages.Finally,we discussed benchmark data and performance evaluation,and pointed out the promising directions for future research.

Key words: Video and images,Text extraction,Text region detection and localization,Text segmentation,Review

[1] JUNG K,KIM K I,JAIN A K.Text information extraction in images and video:a survey [J].Pattern Recognition,2004,37(5):977-997.
[2] Googlegoggles.http://www.google.com/mobile/goggles/#text,2011.
[3] VALIZADEH M,ARMANFARD N,KOMEILI M,et al.A novel hybrid algorithm for binarization of badly illuminated document images[C]∥2009 14th International CSI Computer Conference(CSICC 2009).IEEE,2009:121-126.
[4] CHEN X,YANG J,ZHANG J,et al.Automatic detection andrecognition of signs from natural scenes[J].IEEE Transactions on Image Processing,2004,13(1):87-99.
[5] BISSACCO A,CUMMINS M,NETZER Y,et al.PhotoOCR:Reading text in uncontrolled conditions [C]∥IEEE InternationalConference on Computer Vision.IEEE Computer Society,2013:785-792.
[6] HE Z,LIU J,MA H,et al.A new automatic extraction method of container identity codes [J].IEEE Transactions on Intelligent Transportation Systems,2005,6(1):72-78.
[7] SERMANET P,CHINTALA S,LECUN Y.Convolutional neural networks applied to house numbers digit classification [C]∥International Conference on Pattern Recognition.IEEE,2012:3288-3291.
[8] LEE S H,MIN S C,JUNG K,et al.Scene text extraction with edge constraint and text collinearity [C]∥International Con-ference on Pattern Recognition(ICPR 2010).Istanbul,Turkey,DBLP,2010:3983-3986.
[9] WANG K,BELONGIE S.Word spotting in the wild [C]∥Proceeding of European Conference on Computer Vision(ECCV).Heraklion,Crete,Greece,2010:591-604.
[10] WANG K,BABENKO B,BELONGIE S.End-to-end scene text recognition [C]∥International Conference on Computer Vision(ICCV 2011).IEEE,2011:1457-1464.
[11] LIU X,SAMARABANDU J.Multiscale edge-based text extraction from complex images [C]∥International Conference on Multimedia and Expo.IEEE,2006:1721-1724.
[12] SHIVAKUMARA P,PHAN T Q,TAN C L.A gradient dif-ference based technique for video text detection [C]∥2009 10th International Conference on Document Analysis and Recognition(ICDAR’09).IEEE,2009:156-160.
[13] SHIVAKUMARA P,HUANG W,TAN C L.An efficient edge based technique for text detection in video frames [C]∥Eighth IAPR International Workshop on Document Analysis Systems(DAS’08).IEEE,2008:307-314.
[14] YE Q,JIAO J,HUANG J,et al.Text detection and restoration in natural scene images [J].Journal of Visual Communication and Image Representation,2007,18(6):504-513.
[15] OU W,ZHU J,LIU C.Text location in natural scene [J].Journal of Chinese Information Processing,2004,8(5):42-43.
[16] LYU M R,SONG J,CAI M.A comprehensive method for mutilingual video text detection,localization,and extraction [J].IEEE Trans.on Circuit and Systems for Video Technology,2005,15(2):243-255.
[17] LIU X,SAMARABANDU J.Multisccale edge-based text extraction from complex images[C]∥International Conference on Multimedia and Expo.IEEE,2006:1721-1724.
[18] KIM W J,KIM C.A new approach for overlay text detection and extraction from complex video scene [J].IEEE Transactions on Image Processing,2009,8(2):401-411.
[19] CHO H,SUNG M,JUN B.Canny Text Detector:Fast and robust scene text localization algorithm [C]∥IEEE Conference on Computer Vision and Pattern Recognition.2016:3566-3573.
[20] LI H,DOERMANN D,KIA O.Automatic text detection andtracking in digital video [J].IEEE Transactions on Image Processing,1998,9(1):147-156.
[21] ZHOU G,LIU Y,MENG Q,et al.Detecting multilingual text in natural scene [C]∥International Symposium on Access Spaces(ISAS 2011).IEEE,2011:116-120.
[22] BERTINI M,COLOMBO C,BIMBO A D.Automatic caption localization in videos using salient points [C]∥ IEEE International Conference on Multimedia and Expo.2001:68-71.
[23] SATO T,KANADE T,HUGHES E K,et al.Video OCR for di-gital news archive [C]∥International Workshop on Content-Based Access of Image and Video Libraries.IEEE,1998:52-60.
[24] ZHONG Y,ZHANG H,JAN A K.Automatic caption localiza-tion in compressed video [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2000,22(4):385-392.
[25] GOTO H,TANAKA M.Text-Tracking wearable camera system for the blind [C]∥International Conference on Document Ana-lysis and Recognition.IEEE Computer Society,2009:141-145.
[26] PAN Y F,LIU C L,HOU X.Fast scene text localization bylearning-based filtering and verification [C]∥17th IEEE International Conference on Image Processing(ICIP 2010).IEEE,2010:2269-2272.
[27] WU V,MANMATHA R,RISEMAN E M.Digital Libraries by recognition of superimposed caption Multimedia Systems[J].Proc of 2nd ACM International Conference,1999,7(5):385-395.
[28] ZHAO M,LI S T,KWOK J.Text detection in images using sparserepresentation with discriminative dictionaries [J].Image and Vision Computing,2010,8:1590-1599.
[29] SUN L,LIU G Z,JAN X M,et al.A novel text detection and localization method based on corner response[C]∥Proc of ICME.2009:90-393.
[30] ZHANG H,LIU C,YANG C,et al.An improved scene text extraction method using conditional random field and optical charac-ter recognition[C]∥International Conference on Document Analysis and Recognition(ICDAR 2011).IEEE,2011:708-712.
[31] PAN Y,HOU X,LIU C.A hybrid approach to detect and loca-lize texts in natural scene images [J].IEEE Transactions on Image Processing,2011,0:800-813.
[32] WANG K,KANGAS J A.Character location in scene imagesfrom digital camera [J].Pattern Recognition,2003,36(10):2287-2299.
[33] AGNIHTORI L,DIMITROVA N.Text detection for video ana-lysis [C]∥International Workshop on Content-Based Access of Image and Video Libraries.IEEE,1999:109-113.
[34] HUA X S,YIN P,ZHANG H J.Efficient video text recognition using multiple frame integration [J].Proceedings of International Conference Image Processing,2004,1(2):22-25.
[35] KIM K C K.Scene text extraction in natural scene images using hierarchical feature combining and verification [C]∥Proceedings of International Conference Pattern Recognition(ICPR 2004).2004:679-682.
[36] PAN Y F,HOU X,LIU C L.A hybrid approach to detect and localize texts in natural scene images[J].IEEE Transactions on Image Processing,2011,20(3):800-813.
[37] KIM K I,JUNG K,KIM J H.Texture-Based approach for text detection in images using support vector machines and conti-nuously adaptive mean shift algorithm [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2003,25(12):1631-1639.
[38] KOO H I,KIM D H.Scene text detection via connected component clustering and nontext filtering [M].IEEE Press,2013.
[39] BUI T D,PAN W,SUEN C Y.Text detection from natural scene images using topographic maps and sparse representations [C]∥International Conference on Image Processing.IEEE,2009.
[40] LEE S H,CHO M S,JUNG K,et al.Scene text extraction with edge constraint and text collinearity[C]∥Proceedings of 20th International Conference Pattern Recognition(ICPR 2010).2010:3983-3986.
[41] MINETTO R,THOME N,CORD M,et al.A multiresolution system for text detection in complex detection in complex visual scenes[C]∥Proceedings of 17th International Conference on Image Processing Snoopertext.IEEE,2010:3861-3864.
[42] JUNG C,LIU Q,KIM J.A stroke filter and its application for text localization [J].Pattern Recognition Letters,2009,0(2):114-122.
[43] GUI T Y,SUN J,NAOI S.A fast caption detection method for low quality video images [C]∥International Workshop on Document Analysis Systems(IAPR 2012).2012.
[44] EPSHTEIN B,OFEK E,WEXLER Y.Detecting text in nature scenes with Stroke Width Transform[C]∥Proceedings of Computer Vision and Pattern Recognition(CVPR 2010).IEEE,2010:2963-2970.
[45] CHOWDHURY A R,BHATTACHARYA U,PARUI S K.Scene text detection using sparse stroke information and MLP [C]∥International Conference on Pattern Recognition.2012:294-297.
[46] ZHOU Y,LU T,LIAO W.A robust color-independent text detection method from complex videos[C]∥International Con-ference on Document Analysis and Recognition.IEEE,2011:374-378.
[47] LIU X,WANG W.Robustly extracting captions in videos based on Stroke-Like edges and Spatio-Temporal analysis[J].IEEE Transactions on Multimedia,2012,14(2):482-489.
[48] MOSLEH A,BOUGUILA N,HAMZA A B.Image text detection using a bandlet-based edge detector and stroke width transform [C]∥British Machine Vision Conference.2012.
[49] YAO C,BAI X,SANG N,et al.Scene text detection via holistic,multi-channel prediction[J].arXiv:1606.09002,6.
[50] ZHONG Z,JIN L,ZHANG S,et al.DeepText:A unified framework for text proposal generation and text detection in natural images[J].Architecture Science,2015(12):1-18.
[51] ZHANG Z,SHEN W,YAO C,et al.Symmetry-based text line detection in natural scenes[C]∥Computer Vision and Pattern Recognition.IEEE,2015:2558-2567.
[52] HE T,HUANG W,QIAO Y,et al.Accurate text localization in natural image with cascaded convolutional text network[J].Computer Vision and Pattern Recognition,arXiv:1603.09423,6.
[53] HE T,HUANG W,QIAO Y,et al.Text-Attentional convolu-tional neural network for scene text detection[J].IEEE Tran-sactions on Image Processing,2016,25(6):2529-2541.
[54] YAO C,BAI X.Detecting texts of arbitrary orientations in natural images[C]∥Proceedings of Computer Vision and Pattern Recognition(CVPR 2012).IEEE,2012:1083-1090.
[55] PAN Y F,ZHU Y,SUN J,et al.Improving scene text detection by scale adaptive segmentation and weighted CRF verification [C]∥International Conference on Document Analysis and Re-cognition(ICDAR 2011).IEEE,2011:759-763.
[56] NGUYEN T D,PARK J,LEE G.Tensor voting based text locali-zation in natural scene images [J].IEEE Signal Processing Letters,2010,7(7):639-642.
[57] NEUMANN L,MATAS J.A method for text localization andrecognition in real world images [C]∥Computer Vision-ACCV 2010.New Zealand,2010:770-783.
[58] EITEL A,SPRINGENBERG J T,SPINELLO L,et al.Multimodal deep learning for robust RGB-D object recognition[C]∥International Conference on Intelligent Robots and Systems.IEEE,2015:681-687.
[59] SHI C,WANG C,XIAO B,et al.Scene text recognition usingpart-based tree-structured character detection[C]∥Computer Vision and Pattern Recognition.IEEE,2013:2961-2968.
[60] YE Q,DOERMANN D.Scene text detection via integrated discrimination of component appearance and consensus[M]∥Camera-Based Document Analysis and Recognition.2014:47-59.
[61] YIN X C,YIN X,HUANG K,et al.Robust text detection in na-tural scene images[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2014,36(5):970-983
[62] HUANG W,QIAO Y,TANG X.Robust scene text detectionwith convolution neural network induced MSER trees[C]∥Proceeding of European Conference on Computer Vision(ECCV).2014:497-511.
[63] GOMEZ L,KARATZAS D.Object proposals for text extraction in the wild[C]∥ICDAR.IEEE,2015:1786-1812.
[64] ZHOU Z,LI L,TAN C L.Edge based binarization for video text images [C]∥International Conference on Pattern Recognition(ICPR 2010).Istanbul,Turkey,2010:133-136.
[65] YE Q,GAO W,HUANG Q.Automatic text segmentation from complex background [C]∥International Conference on Image Processing.IEEE,2004:2905-2908.
[66] WANG K,BABENKO B,BELONGIE S.End-to-end scene textrecognition [C]∥IEEE International Conference on Computer Vision(ICCV 2011).Barcelona,Spain,2011:1457-1464.
[67] WEINMAN J J,BUTLER Z,KNOLL D,et al.Toward integratedscene text reading [J].IEEE Transactions on Pattern Analysis &Machine Intelligence,2014,36(2):375-87.
[68] MISHRA A,ALAHARI K,JAWAHAR C V.An MRF model for binarization of natural scene text [C]∥International Con-ference on Document Analysis and Recognition.IEEE Computer So-ciety,2011:11-16.
[69] LEE S,KIM J H.Integrating multiple character proposals for robust scene text extraction [J].Image and Vision Computing,2013,1(11):823-840.
[70] OHTSU N.A threshold selection method from gray-level histograms [J].IEEE Transactions on Systems Man & Cybernetics,2007,9(1):62-66.
[71] BERNSEN J.Dynamic thresholding of gray-level images[C]∥International Conference on Pattern Recognition.1986.
[72] NIBLACK W.An introduction to digital image processing [M].Strandberg Publishing Company,1985.
[73] SAUVOLA J,PIETIKINEN M.Adaptive document image binarization [J].Pattern Recognition,2000,33(2):225-236.
[74] MANCAS-THILLOU C,GOSSELIN B.Color text extractionwith selective metric-based clustering [J].Computer Vision & Image Understanding,2007,107(1):97-107.
[75] KITA K,WAKAHARA T.Binarization of color characters inscene images using k-means clustering and support vector machines [C]∥20th International Conference on Pattern Recognition(ICPR 2010).IEEE,2010:3183-3186.
[76] ZHU Y,SUN J,NAOI S.Recognizing natural scene characters by convolutional neural network and bimodal image enhancement [C]∥International Conference on Camera-Based Document Analysis and Recognition.2012:69-82.
[77] ZHOU S,LIU C,CUI Z.An improved adaptive document image binarization method [C]∥2nd International Congress on Image and Signal Processing(CISP’09).IEEE,2009:1-5.
[78] MISHRA A,ALAHARI K,JAWAHAR C V.Top-down andbottom-up cues for scene text recognition [C]∥Proceedings of Computer Vision and Pattern Recognition(CVPR 2012).IEEE,2012:2687-2694.
[79] LIU J,WANG C.An algorithm for image binarization based on adaptive threshold[C]∥Chinese Control and Decision Con-ference(CCDC 2009).IEEE,2009:3958-3962.
[80] JIANG L,CHEN K,YAN S,et al.Adaptive binarization for degraded document images [C]∥International Conference on Information Engineering and Computer Science(ICIECS 2009).IEEE,2009:1-4.
[81] SAUVOLA J,PIETIKINEN M.Adaptive document image binarization [J].Pattern Recognition,2000,3(2):225-236.
[82] LE H P,LEE G S.Text correction in distorted label images by applying biquadratic transformation [C]∥International Con-ference on Signal and Image Processing Applications (ICSIPA).IEEE,2009:326-329.
[83] SHIVAKUMARA P,SREEDHAR R P,PHAN T Q,et al.Multioriented Video scene text detection through bayesian classification and boundary growing [J].IEEE Transaction on Circuits and Systems for Video Technology,2012,2(8):1227-1235.
[84] WEI B G,ZHANG Y.A nobel approach to text detection and extraction from videos by discriminative features and density [J].Chinese Journal of Electronics,2014,3(2):322-327.
[85] LUCAS S M,PANARETOS A,SOSA L,et al.ICDAR 2003 robust reading competitions [C]∥International Conference on Document Analysis and Recognition,2003(DBLP).2003:682-687.
[86] CAMPOS T E D,BABU B R,VARMA M.Character recognition in natural images[C]∥Proceedings of the Fourth International Conference on Computer Vision Theory and Applications.Lisboa,Portugal,2009:273-280.
[87] NAGY R,DICKER A,MEYER-WEGENER K.NEOCR:A configurable dataset for natural image text recognition[C]∥International Conference on Camera-Based Document Analysis and Re-cognition.Springer-Verlag,2011:150-163.
[88] YI C,TIAN Y L.Text string detection from natural scenes by structure-based partition and grouping [M].IEEE Press,2011.
[89] KARATZAS D,MESTRE S R,MAS J,et al.ICDAR 2011 Robust reading competition-challenge 1:reading text in born-digital images (Web and Email)[C]∥International Conference on Do-cument Analysis and Recognition.IEEE Computer Society,2011:1485-1490.
[90] YAO C.Detecting texts of arbitrary orientations in natural images[C]∥IEEE Conference on Computer Vision and Pattern Re-cognition.2012:1083-1090.
[91] KARATZAS D,SHAFAIT F,UCHIDA S,et al.ICDAR 2013 robust reading competition [C]∥International Conference on Document Analysis and Recognition.IEEE Computer Society,2013:1484-1493.
[92] IWAMURA M,MATSUDA T,MORIMOTO N,et al.Down-town osaka scene text dataset[M]∥Computer Vision-ECCV 2016 Workshops.Springer International Publishing,2016.
[93] VEIT A,MATERA T,NEUMANN L,et al.COCO-Text:Dataset and benchmark for text detection and recognition in natural images[J].Computer Vision and Pattern Recognition,arXiv:1601.07140,6.
[94] SMITH R,GU C,LEE D S,et al.End-to-End interpretation of the french street name signs dataset[M]∥ Computer Vision-ECCV 2016 Workshops.2016:411-426.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!