计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 117-123.doi: 10.11896/jsjkx.190100231
胡志军1,2,徐勇3
HU Zhi-jun1,2,XU Yong3
摘要: 视频是携带信息量最大的媒体,随着抖音短视频等APP的兴起,网络以及数据库的视频数量急剧增加,人工标注的方法已经无法胜任视频检索的任务。视频检索通过提取视频帧的空间特征或者帧与帧之间的时间特征,使得用户能够更客观、更高效地进行视频查找与归类。文中概述了基于内容的视频检索算法,归纳总结了视频检索的一些经典算法,并总结了深度学习在基于内容的视频检索中的研究与应用,最后分析了深度学习在视频检索中的发展前景。
中图分类号:
[1]网络视听生态圈.YouTube宣布每月用户已达15亿移动视频正在抢电视用户[DB/OL].(2017-6-23)[2019-1-8].http://www.sohu.com/a/151603041_728306. [2]新传考研小小班.2018热词|视听传播[DB/OL].(2018-12-5)[2019-1-8].http://www.sohu.com/a/279939987_736480. [3]MEGRHIS,SOUIDENE W,BEGHDADIA.Spatio-temporal salient Feature extraction for Perceptual Content Based Video Retrieval∥IEEE 2013 Colour and Visual Computing Symposium (CVCS).Gjovik,Norway,2013:1-7. [4]ZOLFAGHARI M,SINGH K,BROX T.ECO:Efficient Convolutional Network for Online Video Understanding[J].arXiv:1804.09066,2018. [5]PAL G,RUDRAPAUL D,ACHARJEE S,et al.Video shot boundary detection:a review[C]∥Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2.India:Springer,Cham,2015:119-127. [6]MARCHAND-MAILLET S.Content-based video retrieval:An overview[OL].https://archive-ouverte.unige.ch/unige:48023. [7]SEBEN,LEW M S,ZHOU X,et al.The state of the art in image and video retrieval[C]∥International Conference on Image and Video Retrieval.Springer,Berlin,Heidelberg,2003:1-8. [8]YUAN J,WANG H,XIAO L,et al.A formal study of shot boundary detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2007,17(2):168-186. [9]KIKUKAWA T,KAWAFUCHI S.Development of an automatic summary editing system for the audio visual resources[J].Transactions of the Institute of Electronics Information & Communication Engineers A,1992,75(43):204-212. [10]LEE M S,YANG Y M,LEE S W.Automatic video parsing using shot boundary detection and camera operation analysis[J].Pattern Recognition,2001,34(3):711-719. [11]ZHANG H J,KANKANHALLI A,SMOLIAR S W.Automatic partitioning of full-motion video[J].Multimedia Systems,1993,1(1):10-28. [12]NAGASAKA A,TANAKA Y.Automatic scene-change detection method for video works[C]∥2nd Working Conference on Visual Database Systems.Japan Information Processing Society,1991:119-133. [13]KWEON I S,HAN S,YOON K.A new technique for shot de- tection and key frames selection in histogram space[C]∥Proceedings of the 12th Workshop on Image Processing and Image Understanding.Korea,2000:475-479. [14]YEO B L,LIU B.Rapid scene analysis on compressed video[J].IEEE Transactions on Circuits and Systems for Video Technology,1995,5(6):533-544. [15]QIN J P,FU M S,TU Z Z,et al.Video shot boundary detection based on histogram change ratio[J].Computer Applications and Software,2011,28(4):17-20. [16]KO K C,CHEON Y M,KIM G Y,et al.Video shot boundary detection algorithm[M]∥Computer Vision,Graphics and Image Processing.Springer,Berlin,Heidelberg,2006:388-396. [17]CHANG H,ZHANG M.An algorithm of video Sshotboundary detection based on SVM[J].Graphic and Image,2016,7(20):73-77. [18]LO C C,WANG S J.Video segmentation using a histogram-based fuzzy c-means clustering algorithm[J].Computer Stan-dards & Interfaces,2001,23(5):429-438. [19]GYGLI M.Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks[C]∥2018 International Conference on Content Based Multimedia Indexing,CBMI 2018.La Rochelle,France,2018:1-4. [20]HASSANIEN A,ELGHARIB M,SELIM A,et al.Large-scale,fast and accurate shot boundary detection through spatio-temporal convolutional neural networks[J].arXiv:1705.03281,2017. [21]LI Y,LEE S H,YEH C H,et al.Techniques for movie content analysis and skimming:tutorial and overview on video abstraction techniques[J].IEEE Signal Processing Magazine,2006,23(2):79-89. [22]WANG X J,DING H T,CHEN H X.A shot clustering based approach for scene segmentation[J].Chinese Journal of Image and Graphics,2007,12(12):2127-2130. [23]FERMAN A,TEKALP A.Two-stage hierarchical video sum- mary extraction to match low-level user browsing preferences[J].IEEE Transactions on Multimedia,2003,5(2):244-256. [24]SUN Z,JIA K,CHEN H.Video key frame extraction based on spatial-temporal color distribution[C]∥International Confe-rence on Intelligent Information Hiding and Multimedia Signal Processing.IEEE,2008:196-199. [25]YU X D,WANG L,TIAN Q,et al.Multilevel video representation with application to keyframe extraction[C]∥Proceedings 10th International Multimedia Modelling Conference.IEEE,2004:117-123. [26]ZHUANG Y,RUI Y,HUANG T S,et al.Adaptive key frame extraction using unsupervised clustering[C]∥Proceedings 1998 International Conference on Image Processing(ICIP98).IEEE,1998:866-870. [27]WOLF W.Key frame selection by motion analysis[C]∥IEEE International Conference on Acoustics,Speech,& Signal Processing.1996:1228-1231. [28]LIU T,ZHANG H J,QI F.A novel video key-frame-extraction algorithm based on perceived motion energy model[J].IEEE transactions on Circuits and Systems for Video Technology,2003,13(10):1006-1013. [29]EJAZ N,BAIK S W,MAJEED H,et al.Multi-scale contrast and relative motion-based key frame extraction[J].EURASIP Journal on Image and Video Processing,2018,2018(1):40. [30]HOANG N N,LEE G S,KIM S H,et al.A Real-time Multimodal Hand Gesture Recognition via 3D Convolutional Neural Network and Key Frame Extraction[C]∥Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence.ACM,2018:32-37. [31]YAN X,GILANI S Z,QIN H,et al.Deep Keyframe Detection in Human Action Videos[J].arXiv:1804.10021,2018. [32]CHUN Y D,KIM N C,JANG I H.Content-based image retrie- val using multiresolution color and texture features[J].IEEE Transactions on Multimedia,2008,10(6):1073-1084. [33]LIN C Y,TSENG B L,NAPHADE M,et al.VideoAL:a novel end-to-end MPEG-7 video automatic labeling system[C]∥ In IEEE Intl.Conf.on Image Processing (ICIP).IEEE,2003,3:III-53. [34]CHEUNG S C S,ZAKHOR A.Video similarity detection with video signature clustering[C]∥International Conference on Image Processing,2001.Thessaloniki,Greece:IEEE,2001:649-652. [35]AMIR A,BERG M,CHANG S F,et al.IBM research TRECVID-2003 video retrieval system[OL].https://www.docin.com/p-1550931773.html. [36]DYANA A,SUBRAMANIAN M P,DAS S.Combining features for shape and motion trajectory of video objects for efficient content based video retrieval[C]∥2009 Seventh International Conference on Advances in Pattern Recognition.Kolkata,India:IEEE,2009:113-116. [37]POTLURI T,SRAVANI T,RAMAKRISHNA B,et al.Con- tent-Based Video Retrieval Using Dominant Color and Shape Feature[C]∥Proceedings of the First International Conference on Computational Intelligence and Informatics.Springer,Singapore,2017:373-380. [38]FOLEY C,GURRIN C,JONES G J F,et al.TRECVid 2005 experiments at dublin city university[OL].http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html. [39]JIANG Y G,NGO C W,YANG J.Towards optimal bag-of-features for object categorization and semantic video retrieval[C]∥Proceedings of the 6th ACM International Conference on Image and Video Retrieval.New York,NY,USA:ACM,2007:494-501. [40]HORN B K P,SCHUNCK B G.Determining optical flow[J].Artificial Intelligence,1981,17(1/2/3):185-203. [41]ZHONG D,CHANG S F.Spatio-temporal video search using the object based video representation[C]∥Proceedings of International Conference on Image Processing.Santa Barbara,CA,USA:IEEE,1997,1:21-24. [42]DENGY,MUKHERJEE D,MANJUNATH B S.NeTra-V:Toward an object-based video representation[J].IEEE Transactions on Circuits and Systems for Video Technology,1998,8(5):616-627. [43]BASHARAT A,ZHAI Y,SHAH M.Content based video matching using spatiotemporal volumes[J].Computer Vision and Image Understanding,2008,110(3):360-377. [44]HSIEH J W,YU S L,CHEN Y S.Motion-based video retrieval by trajectory matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2006,16(3):396-409. [45]JUNG Y K,LEE K W,HO Y S.Content-based event retrieval using semantic scene interpretation for automated traffic surveillance[J].IEEE Transactions on Intelligent Transportation Systems,2001,2(3):151-163. [46]LAI Y H,YANG C K.Video object retrieval by trajectory and appearance[J].IEEE Transactions on Circuits and Systems for Video Technology,2015,25(6):1026-1037. [47]KUMAR G S N,REDDY V S K,KUMAR S S.High-Perfor- mance Video Retrieval Based on Spatio-Temporal Features[M]∥Microelectronics,Electromagnetics and Telecommunications.Springer,Singapore,2018:433-441. [48]BRINDHA N,VISALAKSHI P.Bridging semantic gap between high-level and low-level features in content-based video retrieval using multi-stage ESN-SVM classifier[J].Sādhanā,2017,42(1):1-10. [49]FENG Z H,ZHU Y B,LI W Q.Video near-duplicate retrieval based on deep learning[J].Computer Applications and Software,2018,35(1):160-163. [50]DUAN L Y,YUAN J,TIAN Q,et al.Fast and robust video clip search using index structure[C]∥Proceedings of the 12th an-nual ACM international conference on Multimedia.New York,NY,USA:ACM,2004:756-757. [51]FERMAN A M,TEKALP A M,MEHROTRA R.Robust color histogram descriptors for video segment retrieval and identification[J].IEEE Transactions on Image Processing,2002,11(5):497-508. [52]DE ROOVER C,DE VLEESCHOUWER C,LEFEBVRE F, et al.Robust video hashing based on radial projections of key frames[J].IEEE Transactions on Signal processing,2005,53(10):4020-4037. [53]COSKUNB,SANKUR B,MEMON N.Spatio-Temporal Transform Based Video Hashing[J].IEEE Transactions on Multimedia,2006,8(6):1190-1208. [54]NIE X S,WANG S T,YIN Y L.Video hash learning based on feature fusion and Manhattan quantization[J].Journal of Nanjing University,2016,52(4):705-713. [55]CHEN W,DING G,LIN Z,et al.Accelerated Manhattan hashing via bit-remapping with location information[J].Multimedia Tools and Applications,2017,76(2):2441-2466. [56]LIONG V E,LU J,TAN Y P,et al.Deep video hashing[J].IEEE Transactions on Multimedia,2017,19(6):1209-1219. [57]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks∥Advances in Neural Information Processing Systems25(NIPS 2012).Nevada,2012:1097-1105. [58]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas,NV,USA:IEEE,2016:770-778. [59]KORDOPATIS G,PAPADOPOULOS S,PATRAS I,et al. Near-duplicate video retrieval by aggregating intermediate cnn layers[C]∥International Conference on Multimedia Modeling.Springer,Cham,2017:251-263. [60]PODLESNAYA A,PODLESNYY S.Deep learning based se- mantic video indexing and retrieval[C]∥Proceedings of SAI Intelligent Systems Conference.Springer,Cham,2016:359-372. [61]DONG Y,LI J.Video retrieval based on deep convolutional neural network[C]∥Proceedings of the 3rd International Confe-rence on Multimedia Systems and Signal Processing.New York,NY,USA:ACM,2018:12-16. [62]LIU X,ZHAO L,DING D,et al.Deep Hashing with Category Mask for Fast Video Retrieval[J].arXiv:1712.08315,2017. [63]GU Y,MA C,YANG J.Supervised recurrent hashing for large scale video retrieval[C]∥Proceedings of the 2016 ACM on Multimedia Conference.New York,NY,USA:ACM,2016:272-276. [64]ZHANGH,WANG M,HONG R,et al.Play and rewind:Optimizing binary representations of videos by self-supervised temporal hashing[C]∥Proceedings of the 24th ACM International Conference on Multimedia.New York,NY,USA:ACM,2016:781-790. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[3] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[4] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[7] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[8] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[9] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[10] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
[11] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[12] | 刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179 |
[13] | 徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085 |
[14] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[15] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
|