计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 117-123.doi: 10.11896/jsjkx.190100231

• 计算机图形学&多媒体 • 上一篇    下一篇

基于内容的视频检索综述

胡志军1,2,徐勇3   

  1. (贵州大学贵州省公共大数据重点实验室 贵阳550025)1;
    (贵州大学计算机科学与技术学院 贵阳550025)2;
    (哈尔滨工业大学(深圳) 广东 深圳518055)3
  • 收稿日期:2019-01-28 发布日期:2020-01-19
  • 通讯作者: 徐勇(laterfall@hit.edu.cn)
  • 基金资助:
    贵州省公共大数据重点实验室开放课题基金(2018BDKFJJ001)

Overview of Content-based Video Retrieval

HU Zhi-jun1,2,XU Yong3   

  1. (Guizhou Provincial Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China)1;
    (College of Computer Science & Technology,Guizhou University,Guiyang 550025,China)2;
    (Harbin Institute of Technology(Shenzhen),Shenzhen,Guangdong 518055,China)3
  • Received:2019-01-28 Published:2020-01-19
  • About author:HU Zhi-jun,born in 1981,doctorial student,lecturer.His main research interests include fractal image compression,image and video retrieval;XU Yong,born in 1972,Ph.D,professor,Ph.D supervisor.His main research interests include pattern recognition,biometrics,machine learning and video analysis.
  • Supported by:
    This work was supported by the Foundation of Guizhou Provincial Key Laboratory of Public Big Data (2018BDKFJJ001).

摘要: 视频是携带信息量最大的媒体,随着抖音短视频等APP的兴起,网络以及数据库的视频数量急剧增加,人工标注的方法已经无法胜任视频检索的任务。视频检索通过提取视频帧的空间特征或者帧与帧之间的时间特征,使得用户能够更客观、更高效地进行视频查找与归类。文中概述了基于内容的视频检索算法,归纳总结了视频检索的一些经典算法,并总结了深度学习在基于内容的视频检索中的研究与应用,最后分析了深度学习在视频检索中的发展前景。

关键词: 关键帧, 镜头分割, 卷积神经网络, 视频检索, 特征提取

Abstract: Video is the medium with plenty of information,with the rise of short video APP such as vibrato,the number of videosin the network and database has increased dramatically and the method of manual labeling is no longer suitable for video retrieval.Video retrieval by extracting the spatial characteristics of video frames or temporal characteristics between frames and frames enables users to perform video search and categorization more objectively and efficiently.This paper summarized the content-based video retrieval algorithms,some classical algorithms of video retrieval,and the research and application of deep learning in content-based video retrieval.Finally,the development prospect of deep learning in video retrieval was anzlyzed.

Key words: Convolutional neural network, Feature extraction, Key frame, Shot segmentation, Video retrieval

中图分类号: 

  • TP391
[1]网络视听生态圈.YouTube宣布每月用户已达15亿移动视频正在抢电视用户[DB/OL].(2017-6-23)[2019-1-8].http://www.sohu.com/a/151603041_728306.
[2]新传考研小小班.2018热词|视听传播[DB/OL].(2018-12-5)[2019-1-8].http://www.sohu.com/a/279939987_736480.
[3]MEGRHIS,SOUIDENE W,BEGHDADIA.Spatio-temporal salient Feature extraction for Perceptual Content Based Video Retrieval∥IEEE 2013 Colour and Visual Computing Symposium (CVCS).Gjovik,Norway,2013:1-7.
[4]ZOLFAGHARI M,SINGH K,BROX T.ECO:Efficient Convolutional Network for Online Video Understanding[J].arXiv:1804.09066,2018.
[5]PAL G,RUDRAPAUL D,ACHARJEE S,et al.Video shot boundary detection:a review[C]∥Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2.India:Springer,Cham,2015:119-127.
[6]MARCHAND-MAILLET S.Content-based video retrieval:An overview[OL].https://archive-ouverte.unige.ch/unige:48023.
[7]SEBEN,LEW M S,ZHOU X,et al.The state of the art in image and video retrieval[C]∥International Conference on Image and Video Retrieval.Springer,Berlin,Heidelberg,2003:1-8.
[8]YUAN J,WANG H,XIAO L,et al.A formal study of shot boundary detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2007,17(2):168-186.
[9]KIKUKAWA T,KAWAFUCHI S.Development of an automatic summary editing system for the audio visual resources[J].Transactions of the Institute of Electronics Information & Communication Engineers A,1992,75(43):204-212.
[10]LEE M S,YANG Y M,LEE S W.Automatic video parsing using shot boundary detection and camera operation analysis[J].Pattern Recognition,2001,34(3):711-719.
[11]ZHANG H J,KANKANHALLI A,SMOLIAR S W.Automatic partitioning of full-motion video[J].Multimedia Systems,1993,1(1):10-28.
[12]NAGASAKA A,TANAKA Y.Automatic scene-change detection method for video works[C]∥2nd Working Conference on Visual Database Systems.Japan Information Processing Society,1991:119-133.
[13]KWEON I S,HAN S,YOON K.A new technique for shot de- tection and key frames selection in histogram space[C]∥Proceedings of the 12th Workshop on Image Processing and Image Understanding.Korea,2000:475-479.
[14]YEO B L,LIU B.Rapid scene analysis on compressed video[J].IEEE Transactions on Circuits and Systems for Video Technology,1995,5(6):533-544.
[15]QIN J P,FU M S,TU Z Z,et al.Video shot boundary detection based on histogram change ratio[J].Computer Applications and Software,2011,28(4):17-20.
[16]KO K C,CHEON Y M,KIM G Y,et al.Video shot boundary detection algorithm[M]∥Computer Vision,Graphics and Image Processing.Springer,Berlin,Heidelberg,2006:388-396.
[17]CHANG H,ZHANG M.An algorithm of video Sshotboundary detection based on SVM[J].Graphic and Image,2016,7(20):73-77.
[18]LO C C,WANG S J.Video segmentation using a histogram-based fuzzy c-means clustering algorithm[J].Computer Stan-dards & Interfaces,2001,23(5):429-438.
[19]GYGLI M.Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks[C]∥2018 International Conference on Content Based Multimedia Indexing,CBMI 2018.La Rochelle,France,2018:1-4.
[20]HASSANIEN A,ELGHARIB M,SELIM A,et al.Large-scale,fast and accurate shot boundary detection through spatio-temporal convolutional neural networks[J].arXiv:1705.03281,2017.
[21]LI Y,LEE S H,YEH C H,et al.Techniques for movie content analysis and skimming:tutorial and overview on video abstraction techniques[J].IEEE Signal Processing Magazine,2006,23(2):79-89.
[22]WANG X J,DING H T,CHEN H X.A shot clustering based approach for scene segmentation[J].Chinese Journal of Image and Graphics,2007,12(12):2127-2130.
[23]FERMAN A,TEKALP A.Two-stage hierarchical video sum- mary extraction to match low-level user browsing preferences[J].IEEE Transactions on Multimedia,2003,5(2):244-256.
[24]SUN Z,JIA K,CHEN H.Video key frame extraction based on spatial-temporal color distribution[C]∥International Confe-rence on Intelligent Information Hiding and Multimedia Signal Processing.IEEE,2008:196-199.
[25]YU X D,WANG L,TIAN Q,et al.Multilevel video representation with application to keyframe extraction[C]∥Proceedings 10th International Multimedia Modelling Conference.IEEE,2004:117-123.
[26]ZHUANG Y,RUI Y,HUANG T S,et al.Adaptive key frame extraction using unsupervised clustering[C]∥Proceedings 1998 International Conference on Image Processing(ICIP98).IEEE,1998:866-870.
[27]WOLF W.Key frame selection by motion analysis[C]∥IEEE International Conference on Acoustics,Speech,& Signal Processing.1996:1228-1231.
[28]LIU T,ZHANG H J,QI F.A novel video key-frame-extraction algorithm based on perceived motion energy model[J].IEEE transactions on Circuits and Systems for Video Technology,2003,13(10):1006-1013.
[29]EJAZ N,BAIK S W,MAJEED H,et al.Multi-scale contrast and relative motion-based key frame extraction[J].EURASIP Journal on Image and Video Processing,2018,2018(1):40.
[30]HOANG N N,LEE G S,KIM S H,et al.A Real-time Multimodal Hand Gesture Recognition via 3D Convolutional Neural Network and Key Frame Extraction[C]∥Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence.ACM,2018:32-37.
[31]YAN X,GILANI S Z,QIN H,et al.Deep Keyframe Detection in Human Action Videos[J].arXiv:1804.10021,2018.
[32]CHUN Y D,KIM N C,JANG I H.Content-based image retrie- val using multiresolution color and texture features[J].IEEE Transactions on Multimedia,2008,10(6):1073-1084.
[33]LIN C Y,TSENG B L,NAPHADE M,et al.VideoAL:a novel end-to-end MPEG-7 video automatic labeling system[C]∥ In IEEE Intl.Conf.on Image Processing (ICIP).IEEE,2003,3:III-53.
[34]CHEUNG S C S,ZAKHOR A.Video similarity detection with video signature clustering[C]∥International Conference on Image Processing,2001.Thessaloniki,Greece:IEEE,2001:649-652.
[35]AMIR A,BERG M,CHANG S F,et al.IBM research TRECVID-2003 video retrieval system[OL].https://www.docin.com/p-1550931773.html.
[36]DYANA A,SUBRAMANIAN M P,DAS S.Combining features for shape and motion trajectory of video objects for efficient content based video retrieval[C]∥2009 Seventh International Conference on Advances in Pattern Recognition.Kolkata,India:IEEE,2009:113-116.
[37]POTLURI T,SRAVANI T,RAMAKRISHNA B,et al.Con- tent-Based Video Retrieval Using Dominant Color and Shape Feature[C]∥Proceedings of the First International Conference on Computational Intelligence and Informatics.Springer,Singapore,2017:373-380.
[38]FOLEY C,GURRIN C,JONES G J F,et al.TRECVid 2005 experiments at dublin city university[OL].http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.
[39]JIANG Y G,NGO C W,YANG J.Towards optimal bag-of-features for object categorization and semantic video retrieval[C]∥Proceedings of the 6th ACM International Conference on Image and Video Retrieval.New York,NY,USA:ACM,2007:494-501.
[40]HORN B K P,SCHUNCK B G.Determining optical flow[J].Artificial Intelligence,1981,17(1/2/3):185-203.
[41]ZHONG D,CHANG S F.Spatio-temporal video search using the object based video representation[C]∥Proceedings of International Conference on Image Processing.Santa Barbara,CA,USA:IEEE,1997,1:21-24.
[42]DENGY,MUKHERJEE D,MANJUNATH B S.NeTra-V:Toward an object-based video representation[J].IEEE Transactions on Circuits and Systems for Video Technology,1998,8(5):616-627.
[43]BASHARAT A,ZHAI Y,SHAH M.Content based video matching using spatiotemporal volumes[J].Computer Vision and Image Understanding,2008,110(3):360-377.
[44]HSIEH J W,YU S L,CHEN Y S.Motion-based video retrieval by trajectory matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2006,16(3):396-409.
[45]JUNG Y K,LEE K W,HO Y S.Content-based event retrieval using semantic scene interpretation for automated traffic surveillance[J].IEEE Transactions on Intelligent Transportation Systems,2001,2(3):151-163.
[46]LAI Y H,YANG C K.Video object retrieval by trajectory and appearance[J].IEEE Transactions on Circuits and Systems for Video Technology,2015,25(6):1026-1037.
[47]KUMAR G S N,REDDY V S K,KUMAR S S.High-Perfor- mance Video Retrieval Based on Spatio-Temporal Features[M]∥Microelectronics,Electromagnetics and Telecommunications.Springer,Singapore,2018:433-441.
[48]BRINDHA N,VISALAKSHI P.Bridging semantic gap between high-level and low-level features in content-based video retrieval using multi-stage ESN-SVM classifier[J].Sādhanā,2017,42(1):1-10.
[49]FENG Z H,ZHU Y B,LI W Q.Video near-duplicate retrieval based on deep learning[J].Computer Applications and Software,2018,35(1):160-163.
[50]DUAN L Y,YUAN J,TIAN Q,et al.Fast and robust video clip search using index structure[C]∥Proceedings of the 12th an-nual ACM international conference on Multimedia.New York,NY,USA:ACM,2004:756-757.
[51]FERMAN A M,TEKALP A M,MEHROTRA R.Robust color histogram descriptors for video segment retrieval and identification[J].IEEE Transactions on Image Processing,2002,11(5):497-508.
[52]DE ROOVER C,DE VLEESCHOUWER C,LEFEBVRE F, et al.Robust video hashing based on radial projections of key frames[J].IEEE Transactions on Signal processing,2005,53(10):4020-4037.
[53]COSKUNB,SANKUR B,MEMON N.Spatio-Temporal Transform Based Video Hashing[J].IEEE Transactions on Multimedia,2006,8(6):1190-1208.
[54]NIE X S,WANG S T,YIN Y L.Video hash learning based on feature fusion and Manhattan quantization[J].Journal of Nanjing University,2016,52(4):705-713.
[55]CHEN W,DING G,LIN Z,et al.Accelerated Manhattan hashing via bit-remapping with location information[J].Multimedia Tools and Applications,2017,76(2):2441-2466.
[56]LIONG V E,LU J,TAN Y P,et al.Deep video hashing[J].IEEE Transactions on Multimedia,2017,19(6):1209-1219.
[57]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks∥Advances in Neural Information Processing Systems25(NIPS 2012).Nevada,2012:1097-1105.
[58]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas,NV,USA:IEEE,2016:770-778.
[59]KORDOPATIS G,PAPADOPOULOS S,PATRAS I,et al. Near-duplicate video retrieval by aggregating intermediate cnn layers[C]∥International Conference on Multimedia Modeling.Springer,Cham,2017:251-263.
[60]PODLESNAYA A,PODLESNYY S.Deep learning based se- mantic video indexing and retrieval[C]∥Proceedings of SAI Intelligent Systems Conference.Springer,Cham,2016:359-372.
[61]DONG Y,LI J.Video retrieval based on deep convolutional neural network[C]∥Proceedings of the 3rd International Confe-rence on Multimedia Systems and Signal Processing.New York,NY,USA:ACM,2018:12-16.
[62]LIU X,ZHAO L,DING D,et al.Deep Hashing with Category Mask for Fast Video Retrieval[J].arXiv:1712.08315,2017.
[63]GU Y,MA C,YANG J.Supervised recurrent hashing for large scale video retrieval[C]∥Proceedings of the 2016 ACM on Multimedia Conference.New York,NY,USA:ACM,2016:272-276.
[64]ZHANGH,WANG M,HONG R,et al.Play and rewind:Optimizing binary representations of videos by self-supervised temporal hashing[C]∥Proceedings of the 24th ACM International Conference on Multimedia.New York,NY,USA:ACM,2016:781-790.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[8] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[9] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[10] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[11] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[12] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[13] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[15] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!