Computer Science ›› 2020, Vol. 47 ›› Issue (1): 117-123.doi: 10.11896/jsjkx.190100231

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Overview of Content-based Video Retrieval

HU Zhi-jun1,2,XU Yong3   

  1. (Guizhou Provincial Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China)1;
    (College of Computer Science & Technology,Guizhou University,Guiyang 550025,China)2;
    (Harbin Institute of Technology(Shenzhen),Shenzhen,Guangdong 518055,China)3
  • Received:2019-01-28 Published:2020-01-19
  • About author:HU Zhi-jun,born in 1981,doctorial student,lecturer.His main research interests include fractal image compression,image and video retrieval;XU Yong,born in 1972,Ph.D,professor,Ph.D supervisor.His main research interests include pattern recognition,biometrics,machine learning and video analysis.
  • Supported by:
    This work was supported by the Foundation of Guizhou Provincial Key Laboratory of Public Big Data (2018BDKFJJ001).

Abstract: Video is the medium with plenty of information,with the rise of short video APP such as vibrato,the number of videosin the network and database has increased dramatically and the method of manual labeling is no longer suitable for video retrieval.Video retrieval by extracting the spatial characteristics of video frames or temporal characteristics between frames and frames enables users to perform video search and categorization more objectively and efficiently.This paper summarized the content-based video retrieval algorithms,some classical algorithms of video retrieval,and the research and application of deep learning in content-based video retrieval.Finally,the development prospect of deep learning in video retrieval was anzlyzed.

Key words: Convolutional neural network, Feature extraction, Key frame, Shot segmentation, Video retrieval

CLC Number: 

  • TP391
[3]MEGRHIS,SOUIDENE W,BEGHDADIA.Spatio-temporal salient Feature extraction for Perceptual Content Based Video Retrieval∥IEEE 2013 Colour and Visual Computing Symposium (CVCS).Gjovik,Norway,2013:1-7.
[4]ZOLFAGHARI M,SINGH K,BROX T.ECO:Efficient Convolutional Network for Online Video Understanding[J].arXiv:1804.09066,2018.
[5]PAL G,RUDRAPAUL D,ACHARJEE S,et al.Video shot boundary detection:a review[C]∥Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2.India:Springer,Cham,2015:119-127.
[6]MARCHAND-MAILLET S.Content-based video retrieval:An overview[OL].
[7]SEBEN,LEW M S,ZHOU X,et al.The state of the art in image and video retrieval[C]∥International Conference on Image and Video Retrieval.Springer,Berlin,Heidelberg,2003:1-8.
[8]YUAN J,WANG H,XIAO L,et al.A formal study of shot boundary detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2007,17(2):168-186.
[9]KIKUKAWA T,KAWAFUCHI S.Development of an automatic summary editing system for the audio visual resources[J].Transactions of the Institute of Electronics Information & Communication Engineers A,1992,75(43):204-212.
[10]LEE M S,YANG Y M,LEE S W.Automatic video parsing using shot boundary detection and camera operation analysis[J].Pattern Recognition,2001,34(3):711-719.
[11]ZHANG H J,KANKANHALLI A,SMOLIAR S W.Automatic partitioning of full-motion video[J].Multimedia Systems,1993,1(1):10-28.
[12]NAGASAKA A,TANAKA Y.Automatic scene-change detection method for video works[C]∥2nd Working Conference on Visual Database Systems.Japan Information Processing Society,1991:119-133.
[13]KWEON I S,HAN S,YOON K.A new technique for shot de- tection and key frames selection in histogram space[C]∥Proceedings of the 12th Workshop on Image Processing and Image Understanding.Korea,2000:475-479.
[14]YEO B L,LIU B.Rapid scene analysis on compressed video[J].IEEE Transactions on Circuits and Systems for Video Technology,1995,5(6):533-544.
[15]QIN J P,FU M S,TU Z Z,et al.Video shot boundary detection based on histogram change ratio[J].Computer Applications and Software,2011,28(4):17-20.
[16]KO K C,CHEON Y M,KIM G Y,et al.Video shot boundary detection algorithm[M]∥Computer Vision,Graphics and Image Processing.Springer,Berlin,Heidelberg,2006:388-396.
[17]CHANG H,ZHANG M.An algorithm of video Sshotboundary detection based on SVM[J].Graphic and Image,2016,7(20):73-77.
[18]LO C C,WANG S J.Video segmentation using a histogram-based fuzzy c-means clustering algorithm[J].Computer Stan-dards & Interfaces,2001,23(5):429-438.
[19]GYGLI M.Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks[C]∥2018 International Conference on Content Based Multimedia Indexing,CBMI 2018.La Rochelle,France,2018:1-4.
[20]HASSANIEN A,ELGHARIB M,SELIM A,et al.Large-scale,fast and accurate shot boundary detection through spatio-temporal convolutional neural networks[J].arXiv:1705.03281,2017.
[21]LI Y,LEE S H,YEH C H,et al.Techniques for movie content analysis and skimming:tutorial and overview on video abstraction techniques[J].IEEE Signal Processing Magazine,2006,23(2):79-89.
[22]WANG X J,DING H T,CHEN H X.A shot clustering based approach for scene segmentation[J].Chinese Journal of Image and Graphics,2007,12(12):2127-2130.
[23]FERMAN A,TEKALP A.Two-stage hierarchical video sum- mary extraction to match low-level user browsing preferences[J].IEEE Transactions on Multimedia,2003,5(2):244-256.
[24]SUN Z,JIA K,CHEN H.Video key frame extraction based on spatial-temporal color distribution[C]∥International Confe-rence on Intelligent Information Hiding and Multimedia Signal Processing.IEEE,2008:196-199.
[25]YU X D,WANG L,TIAN Q,et al.Multilevel video representation with application to keyframe extraction[C]∥Proceedings 10th International Multimedia Modelling Conference.IEEE,2004:117-123.
[26]ZHUANG Y,RUI Y,HUANG T S,et al.Adaptive key frame extraction using unsupervised clustering[C]∥Proceedings 1998 International Conference on Image Processing(ICIP98).IEEE,1998:866-870.
[27]WOLF W.Key frame selection by motion analysis[C]∥IEEE International Conference on Acoustics,Speech,& Signal Processing.1996:1228-1231.
[28]LIU T,ZHANG H J,QI F.A novel video key-frame-extraction algorithm based on perceived motion energy model[J].IEEE transactions on Circuits and Systems for Video Technology,2003,13(10):1006-1013.
[29]EJAZ N,BAIK S W,MAJEED H,et al.Multi-scale contrast and relative motion-based key frame extraction[J].EURASIP Journal on Image and Video Processing,2018,2018(1):40.
[30]HOANG N N,LEE G S,KIM S H,et al.A Real-time Multimodal Hand Gesture Recognition via 3D Convolutional Neural Network and Key Frame Extraction[C]∥Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence.ACM,2018:32-37.
[31]YAN X,GILANI S Z,QIN H,et al.Deep Keyframe Detection in Human Action Videos[J].arXiv:1804.10021,2018.
[32]CHUN Y D,KIM N C,JANG I H.Content-based image retrie- val using multiresolution color and texture features[J].IEEE Transactions on Multimedia,2008,10(6):1073-1084.
[33]LIN C Y,TSENG B L,NAPHADE M,et al.VideoAL:a novel end-to-end MPEG-7 video automatic labeling system[C]∥ In IEEE Intl.Conf.on Image Processing (ICIP).IEEE,2003,3:III-53.
[34]CHEUNG S C S,ZAKHOR A.Video similarity detection with video signature clustering[C]∥International Conference on Image Processing,2001.Thessaloniki,Greece:IEEE,2001:649-652.
[35]AMIR A,BERG M,CHANG S F,et al.IBM research TRECVID-2003 video retrieval system[OL].
[36]DYANA A,SUBRAMANIAN M P,DAS S.Combining features for shape and motion trajectory of video objects for efficient content based video retrieval[C]∥2009 Seventh International Conference on Advances in Pattern Recognition.Kolkata,India:IEEE,2009:113-116.
[37]POTLURI T,SRAVANI T,RAMAKRISHNA B,et al.Con- tent-Based Video Retrieval Using Dominant Color and Shape Feature[C]∥Proceedings of the First International Conference on Computational Intelligence and Informatics.Springer,Singapore,2017:373-380.
[38]FOLEY C,GURRIN C,JONES G J F,et al.TRECVid 2005 experiments at dublin city university[OL].
[39]JIANG Y G,NGO C W,YANG J.Towards optimal bag-of-features for object categorization and semantic video retrieval[C]∥Proceedings of the 6th ACM International Conference on Image and Video Retrieval.New York,NY,USA:ACM,2007:494-501.
[40]HORN B K P,SCHUNCK B G.Determining optical flow[J].Artificial Intelligence,1981,17(1/2/3):185-203.
[41]ZHONG D,CHANG S F.Spatio-temporal video search using the object based video representation[C]∥Proceedings of International Conference on Image Processing.Santa Barbara,CA,USA:IEEE,1997,1:21-24.
[42]DENGY,MUKHERJEE D,MANJUNATH B S.NeTra-V:Toward an object-based video representation[J].IEEE Transactions on Circuits and Systems for Video Technology,1998,8(5):616-627.
[43]BASHARAT A,ZHAI Y,SHAH M.Content based video matching using spatiotemporal volumes[J].Computer Vision and Image Understanding,2008,110(3):360-377.
[44]HSIEH J W,YU S L,CHEN Y S.Motion-based video retrieval by trajectory matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2006,16(3):396-409.
[45]JUNG Y K,LEE K W,HO Y S.Content-based event retrieval using semantic scene interpretation for automated traffic surveillance[J].IEEE Transactions on Intelligent Transportation Systems,2001,2(3):151-163.
[46]LAI Y H,YANG C K.Video object retrieval by trajectory and appearance[J].IEEE Transactions on Circuits and Systems for Video Technology,2015,25(6):1026-1037.
[47]KUMAR G S N,REDDY V S K,KUMAR S S.High-Perfor- mance Video Retrieval Based on Spatio-Temporal Features[M]∥Microelectronics,Electromagnetics and Telecommunications.Springer,Singapore,2018:433-441.
[48]BRINDHA N,VISALAKSHI P.Bridging semantic gap between high-level and low-level features in content-based video retrieval using multi-stage ESN-SVM classifier[J].Sādhanā,2017,42(1):1-10.
[49]FENG Z H,ZHU Y B,LI W Q.Video near-duplicate retrieval based on deep learning[J].Computer Applications and Software,2018,35(1):160-163.
[50]DUAN L Y,YUAN J,TIAN Q,et al.Fast and robust video clip search using index structure[C]∥Proceedings of the 12th an-nual ACM international conference on Multimedia.New York,NY,USA:ACM,2004:756-757.
[51]FERMAN A M,TEKALP A M,MEHROTRA R.Robust color histogram descriptors for video segment retrieval and identification[J].IEEE Transactions on Image Processing,2002,11(5):497-508.
[52]DE ROOVER C,DE VLEESCHOUWER C,LEFEBVRE F, et al.Robust video hashing based on radial projections of key frames[J].IEEE Transactions on Signal processing,2005,53(10):4020-4037.
[53]COSKUNB,SANKUR B,MEMON N.Spatio-Temporal Transform Based Video Hashing[J].IEEE Transactions on Multimedia,2006,8(6):1190-1208.
[54]NIE X S,WANG S T,YIN Y L.Video hash learning based on feature fusion and Manhattan quantization[J].Journal of Nanjing University,2016,52(4):705-713.
[55]CHEN W,DING G,LIN Z,et al.Accelerated Manhattan hashing via bit-remapping with location information[J].Multimedia Tools and Applications,2017,76(2):2441-2466.
[56]LIONG V E,LU J,TAN Y P,et al.Deep video hashing[J].IEEE Transactions on Multimedia,2017,19(6):1209-1219.
[57]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks∥Advances in Neural Information Processing Systems25(NIPS 2012).Nevada,2012:1097-1105.
[58]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas,NV,USA:IEEE,2016:770-778.
[59]KORDOPATIS G,PAPADOPOULOS S,PATRAS I,et al. Near-duplicate video retrieval by aggregating intermediate cnn layers[C]∥International Conference on Multimedia Modeling.Springer,Cham,2017:251-263.
[60]PODLESNAYA A,PODLESNYY S.Deep learning based se- mantic video indexing and retrieval[C]∥Proceedings of SAI Intelligent Systems Conference.Springer,Cham,2016:359-372.
[61]DONG Y,LI J.Video retrieval based on deep convolutional neural network[C]∥Proceedings of the 3rd International Confe-rence on Multimedia Systems and Signal Processing.New York,NY,USA:ACM,2018:12-16.
[62]LIU X,ZHAO L,DING D,et al.Deep Hashing with Category Mask for Fast Video Retrieval[J].arXiv:1712.08315,2017.
[63]GU Y,MA C,YANG J.Supervised recurrent hashing for large scale video retrieval[C]∥Proceedings of the 2016 ACM on Multimedia Conference.New York,NY,USA:ACM,2016:272-276.
[64]ZHANGH,WANG M,HONG R,et al.Play and rewind:Optimizing binary representations of videos by self-supervised temporal hashing[C]∥Proceedings of the 24th ACM International Conference on Multimedia.New York,NY,USA:ACM,2016:781-790.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4] ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39.
[5] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[6] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[7] CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[8] LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[9] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[10] LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11.
[11] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[12] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[13] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[14] SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440.
[15] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
Full text



No Suggested Reading articles found!