Computer Science ›› 2015, Vol. 42 ›› Issue (11): 293-298.doi: 10.11896/j.issn.1002-137X.2015.11.060

Previous Articles     Next Articles

Human Action Recognition by Visual Word Based on Local and Global Features

XIE Fei, GONG Sheng-rong, LIU Chun-ping and JI Yi   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Different from the method based on low-level features,the human action recognition based on visual word adds mid-level semantic information to features and then improves the accuracy of recognition.For complex background or dynamic scenes,the efficiency of visual words might deteriorate.We proposed a new method which is a combination of local and global feature to generate visual words.Firstly,our approach uses saliency map to detect the rectangles around human.And then inside these rectangles,3D-SIFT will be calculated around interest points detected from dynamic threshold matrix to describe local features.We also added HOOF to describe the global motion information.These visual words provide the important semantic information in the video such as brightness contrast,motion information,etc.The performance of this method in action recognition can be improved 6.4% on KTH dataset and 6.5% on UCF dataset compared with state-of-the-art methods.The experiment results also indicate that our visual dictionary has more advantages in both simple background and dynamic scene than others.

Key words: Visual words,Saliency map,3D-SIFT,Dynamic threshold matrix,HOOF

[1] Hofmann T.Probabilistic latent semantic indexing[C]∥ACMSIGIR Conference on Research and Development in Information Retrieval.1999:50-57
[2] Blei D M.Probabilistic models of text and images[D].California:University of California,2004
[3] Li Fei-fei,Perona P.A bayesian hierarchical model for learning natural scene categories [C]∥2005 IEEE Computer Vision and Pattern Recognition.2005:524-531
[4] Yang J,Jiang Y G,Hauptmann A G,et al.Evaluating bag-of-visual-words representations in scene classification [C]∥International Workshop on Multimedia Information Retrieval.2007:197-206
[5] Laptev I.On space-time interest points [J].International Journal of Computer Vision,2005,64(2/3):107-123
[6] Dollár P,Rabaud V,Cottrell G,et al.Behavior recognition viasparse spatio-temporal features [C]∥2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evalua-tion of Tracking and Surveillance.2005:65-72
[7] Scovanner P,Ali S,Shah M.A 3-dimensional sift descriptor and its application to action recognition [C]∥Proceedings of the 15th International Conference on Multimedia.2007:357-360
[8] Chaudhry R,Ravichandran A,Hager G,et al.Histograms of orien-ted optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions [C]∥ IEEE Conference on Computer Vision and Pattern Recognition.2009:1932-1939
[9] Xia L,Chen C C,Aggarwal J K.View invariant human action recognition using histograms of 3d joints [C]∥2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.2012:20-27
[10] Shao L,Ji L,Liu Y,et al.Human action segmentation and recognition via motion and shape analysis[J].Pattern Recognition Letters,2012,33(4):438-445
[11] Ullah M M,Laptev I.Actlets:A novel local representation for human action recognition in video [C]∥ 2012 19th IEEE International Conference on Image Processing.2012:777-780
[12] Kovashka A,Grauman K.Learning a hierarchy of discriminative space-time neighborhood features for human action recognition [C]∥2013 IEEE Conference on Computer Vision and Pattern Recognition.2010:2046-2053
[13] Harel,Jonathan,Koch C,et al.Graph-based visual saliency [C]∥Proceedings of the 20th Annual Conference on in Neural Information Processing Systems.2006:1523-1527
[14] Von Luxburg U.A tutorial on spectral clustering [J].Statistics and Computing,2007,17(4):395-416
[15] Hou X,Zhang L.Saliency detection:a spectral residual approach [C]∥2007 IEEE Conference on Computer Vision and Pattern Recognition.2007:1-8
[16] Guo C,Zhang L.A novel multiresolution spatiotemporal saliencydetection model and its applications in image and video compression [J].IEEE Transactions on Image Processing,2010,19(1):185-198
[17] Itti L,Koch C,Niebur E.A model of saliency-based visual attention for rapid scene analysis [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(11):1254-1259
[18] Lowe D G.Distinctive image features from scale-invariant keypoints [J].International Journal of Computer Vision,2004,60(2):91-110
[19] Efros A A,Berg A C,Mori G,et al.Recognizing action at a distance[C]∥9th IEEE International Conference on Computer Vision.2003:726-733
[20] Cai D,He X,Han J.Document clustering using locality preserving indexing[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(12):1624-1637
[21] Ke Y,Sukthankar R,Hebert M.Spatio-temporal shape and flow correlation for action recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2007:1-8
[22] Niebles J C,Wang H,Li Fei-fei.Unsupervised learning of human action categories using spatial-temporal words [J].International Journal of Computer Vision,2008,79(3):299-318
[23] Willems G,Tuytelaars T,Van Gool L.An efficient dense andscale-invariant spatio-temporal interest point detector[M]∥ Computer Vision-ECCV 2008.Springer Berlin Heidelberg,2008:650-663
[24] Guha T,Ward R K.Learning sparse representations for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence 2012,34(8):1576-1588
[25] Baysal S,Duygulu P.A line based pose representation for human action recognition[J].Signal Processing:Image Communication,2013,28(5):458-471

No related articles found!
Full text



No Suggested Reading articles found!