计算机科学 ›› 2018, Vol. 45 ›› Issue (8): 50-53.doi: 10.11896/j.issn.1002-137X.2018.08.009

• 2017 中国多媒体大会 • 上一篇    下一篇

两阶段的视频字幕检测和提取算法

王智慧1, 李佳桐2, 谢斯言2, 周佳2, 李豪杰1, 樊鑫1   

  1. 大连理工大学国际信息与软件学院 辽宁 大连1166211
    大连理工大学软件学院 辽宁 大连1166212
  • 收稿日期:2017-10-24 出版日期:2018-08-29 发布日期:2018-08-29
  • 作者简介:王智慧(1982-),女,博士,副教授,CCF会员,主要研究领域为计算机视觉、图像安全,E-mail:zhwang@dlut.edu.cn; 李佳桐(1996-),女,主要研究领域为图像处理、深度学习; 谢斯言(1996-),女,主要研究领域为数字媒体; 周 佳(1996-),女,主要研究领域为软件工程、金融与信息服务; 李豪杰(1973-),男,博士,教授,CCF会员,主要研究领域为计算机视觉、多媒体信息检索,E-mail:hjli@dlut.edu.cn(通信作者); 樊 鑫(1977-),男,博士,教授,CCF会员,主要研究领域为计算机视觉与图像处理、医学影像分析,E-mail:xin.fan@ieee.org。
  • 基金资助:
    本文受国家自然科学基金(61472059,61772108)资助。

Two-stage Method for Video Caption Detection and Extraction

WANG Zhi-hui1, LI Jia-tong2, XIE Si-yan2, ZHOU Jia2, LI Hao-jie1, FAN Xin1   

  1. Department of International Information and Software Technology,Dalian University of Technology,Dalian,Liaoning 116621,China1
    Department of Software Technology,Dalian University of Technology,Dalian,Liaoning 116621,China2
  • Received:2017-10-24 Online:2018-08-29 Published:2018-08-29

摘要: 视频字幕检测和提取是视频理解的关键技术之一。文中提出一种两阶段的字幕检测和提取算法,将字幕帧和字幕区域分开检测,从而提高检测效率和准确率。第一阶段进行字幕帧检测:首先,根据帧间差算法进行运动检测,对字幕进行初步判断,得到二值化图像序列;然后,根据普通字幕和滚动字幕的动态特征对该序列进行二次筛选,得到字幕帧。第二阶段对字幕帧进行字幕区域检测和提取:首先,利用Sobel边缘检测算法初检文字区域;然后,利用高度约束等剔除背景,并根据宽高比区分出纵向字幕和横向字幕,从而得到字幕帧中的所有字幕,即静止字幕、普通字幕、滚动字幕。该方法减少了需要检测的帧数,将字幕检测效率提高了约11%。实验对比结果证明,相比单一使用帧间差和边缘检测的方法,该方法在F值上提升约9%。

关键词: Sobel边缘检测, 动态特征, 检测和提取, 视频字幕, 帧间差

Abstract: Video caption detection and extraction is one of the key technologies forvideo understanding.This paper proposed a two-stage approach which divides the process into caption frame and caption area,improving the caption detection efficiency and accuracy.In the first stage,caption frame detection and extraction is conducted.Firstly,the motion detection is performed according to the gray correlation frame difference,the captions are judged initially,and a new binary image sequence is obtained.Then,according to dynamic characteristics of ordinary captions and scrolling captions,the new sequence is screened two times to get caption frame.In the second stage,caption area detection and extraction is conducted.Firstly,the Sobel edge detection algorithm is used to detect the caption region,and the background is eliminated according to the constraint height.Then according to the aspect ratio,the vertical and horizontal captions are distinguished,and all captions in the caption frame can be obtained,including static captions,ordinary captions and scrol-ling captions.This method reduces the frames which need to be detected and improves caption detection efficiency by 11%.The experimental results show that the proposed method can approximately improve the F score by 9% compared with the methods of separately using the gray correlation frame difference and edge detection.

Key words: Detection and extraction, Dynamic characteristics, Gray correlation frame difference, Sobel edge detection, Video caption

中图分类号: 

  • TP391
[1]WANG G H,WANG Z,YANG Y M,et al.Detection and positioning of video captions based on ICA algorithm [J].Journal of Xi’an Shiyou University (Natural Science Edition),2011,26(3):100-103.(in Chinese)王国红,王喆,杨永民,等.基于ICA算法的视频字幕检测与定位[J].西安石油大学学报(自然科学版),2011,26(3):100-103.
[2]WANG R R,JIN W J,WU L D.A new algorithm for detecting video caption by using multi-frame combination [J].Journal of Computer Research and Development,2005,42(7):1191-1197.(in Chinese)王蓉蓉,金万军,吴立德.一种新的利用多帧结合检测视频标题文字的算法[J].计算机研究与发展,2005,42(7):1191-1197.
[3]ZHAO X,LIN K H,HU Y X,et al.Caption form corners:A novel approach to detect caption and caption in videos [J].IEEE Transactions on Circuits and Systems for Video Technology,2005,15(2):243-255.
[4]SATO T,KANADE T,HUGHES E K,et al.Video OCR:indexing digital news libraries by recognition of superimposed captions[J].Multimedia Systems,1999,7(5):385-395.
[5]ANGADI S A,KODABAGI M M.Text region extraction from low resolution natural scene images using texture features[C]∥2010 IEEE 2nd International Advance Computing Conference (IACC).2010.
[6]OUYANG P R,ZHANG W J,GUPTA M M.An adaptiveswitching learning control method for trajectory tracking of robot manipulators[J].Mechatronics,2006,16(1):51-61.
[7]SU Y X,PARRA-VEGA V .Global asymptotic saturated output feedback control of robot manipulators[C]∥Proceedings of the 7th World Congress on Intelligent Control and Automation.2008.
[8]ZHAO X,LIN K H,FU Y,et al.Text from corners:A novel approach to detect text and caption in videos[J].IEEE Transactions on Image Processing,2011,20(3):790-799.
[9]LYU M R,SONG J Q,CAI M.A comprehensive method formultilingual video caption detection,localization and extraction[J].IEEE Transactions on Circuits and Systems for Video Technology,2005,15(2):243-255.
[10]LANG Y,ZHENG D.An Improved Sobel Edge Detection Ope-rator[C]∥IEEE International Conference on Computer Science and Information Technology.IEEE,2010:67-71.
[1] 王恰, 戚湧.
基于帧间差分和统计直方图的交通视频背景建模方法
Method for Traffic Video Background Modeling Based on Inter-frame Difference and Statistical Histogram
计算机科学, 2020, 47(10): 174-179. https://doi.org/10.11896/jsjkx.190800014
[2] 肖潇, 孔凡芝, 刘金华.
基于动静态特征的监控视频火灾检测算法
Monitoring Video Fire Detection Algorithm Based on DynamicCharacteristics and Static Characteristics
计算机科学, 2019, 46(6A): 284-286.
[3] 袁亚军, 李菲菲, 陈虬.
基于复合特征及深度学习的人群行为识别算法
Crowd Behavior Recognition Algorithm Based on Combined Features and Deep Learning
计算机科学, 2019, 46(6): 305-310. https://doi.org/10.11896/j.issn.1002-137X.2019.06.046
[4] 安国成,张凤军.
利用检测特征空间的目标实时跟踪
Target Tracking Based on Feature Space of Detection
计算机科学, 2013, 40(Z11): 309-313.
[5] 刘志勇,冯国灿,邹小林.
一种基于静态和动态特征的步态识别新方法
New Gait Recognition Method Based on Static and Dynamic Features
计算机科学, 2012, 39(4): 261-264.
[6] .
一种基于动态特征词典的SVM中文电子邮件过滤方法

计算机科学, 2008, 35(3): 49-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!