计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 293-298.doi: 10.11896/j.issn.1002-137X.2015.11.060

• 图形图像与模式识别 • 上一篇    下一篇

基于局部和全局特征视觉单词的人物行为识别

谢飞,龚声蓉,刘纯平,季怡   

  1. 苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006,苏州大学计算机科学与技术学院 苏州215006
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金:基于二型模糊概率图模型的多摄像头目标跟踪研究(61170124),基于显著性和信任传递的动态场景主题发现(61272258),基于深度学习的时序3D深度图动作语义理解(61301299),江苏省产学研联合创新资金(前瞻性联合研究项目):复杂场景下异常行为分析及其应用(BY2014059-14)资助

Human Action Recognition by Visual Word Based on Local and Global Features

XIE Fei, GONG Sheng-rong, LIU Chun-ping and JI Yi   

  • Online:2018-11-14 Published:2018-11-14

摘要: 基于视觉单词的人物行为识别由于在特征中加入了中层语义信息,因此提高了识别的准确性。然而,视觉单词提取时由于前景和背景存在相互干扰,使得视觉单词的表达能力受到影响。提出一种结合局部和全局特征的视觉单词生成方法。该方法首先用显著图检测出前景人物区域,采用提出的动态阈值矩阵对人物区域用不同的阈值来分别检测时空兴趣点,并计算周围的3D-SIFT特征来描述局部信息。在此基础上,采用光流直方图特征描述行为的全局运动信息。通过谱聚类将局部和全局特征融合成视觉单词。实验证明,相对于流行的局部特征视觉单词生成方法,所提出的方法在简单背景的KTH数据集上的识别率比平均识别率提高了6.4%,在复杂背景的UCF数据集上的识别率比平均识别率提高了6.5%。

关键词: 视觉单词,显著图,3D-SIFT,动态阈值矩阵,光流直方图

Abstract: Different from the method based on low-level features,the human action recognition based on visual word adds mid-level semantic information to features and then improves the accuracy of recognition.For complex background or dynamic scenes,the efficiency of visual words might deteriorate.We proposed a new method which is a combination of local and global feature to generate visual words.Firstly,our approach uses saliency map to detect the rectangles around human.And then inside these rectangles,3D-SIFT will be calculated around interest points detected from dynamic threshold matrix to describe local features.We also added HOOF to describe the global motion information.These visual words provide the important semantic information in the video such as brightness contrast,motion information,etc.The performance of this method in action recognition can be improved 6.4% on KTH dataset and 6.5% on UCF dataset compared with state-of-the-art methods.The experiment results also indicate that our visual dictionary has more advantages in both simple background and dynamic scene than others.

Key words: Visual words,Saliency map,3D-SIFT,Dynamic threshold matrix,HOOF

[1] Hofmann T.Probabilistic latent semantic indexing[C]∥ACMSIGIR Conference on Research and Development in Information Retrieval.1999:50-57
[2] Blei D M.Probabilistic models of text and images[D].California:University of California,2004
[3] Li Fei-fei,Perona P.A bayesian hierarchical model for learning natural scene categories [C]∥2005 IEEE Computer Vision and Pattern Recognition.2005:524-531
[4] Yang J,Jiang Y G,Hauptmann A G,et al.Evaluating bag-of-visual-words representations in scene classification [C]∥International Workshop on Multimedia Information Retrieval.2007:197-206
[5] Laptev I.On space-time interest points [J].International Journal of Computer Vision,2005,64(2/3):107-123
[6] Dollár P,Rabaud V,Cottrell G,et al.Behavior recognition viasparse spatio-temporal features [C]∥2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evalua-tion of Tracking and Surveillance.2005:65-72
[7] Scovanner P,Ali S,Shah M.A 3-dimensional sift descriptor and its application to action recognition [C]∥Proceedings of the 15th International Conference on Multimedia.2007:357-360
[8] Chaudhry R,Ravichandran A,Hager G,et al.Histograms of orien-ted optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions [C]∥ IEEE Conference on Computer Vision and Pattern Recognition.2009:1932-1939
[9] Xia L,Chen C C,Aggarwal J K.View invariant human action recognition using histograms of 3d joints [C]∥2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.2012:20-27
[10] Shao L,Ji L,Liu Y,et al.Human action segmentation and recognition via motion and shape analysis[J].Pattern Recognition Letters,2012,33(4):438-445
[11] Ullah M M,Laptev I.Actlets:A novel local representation for human action recognition in video [C]∥ 2012 19th IEEE International Conference on Image Processing.2012:777-780
[12] Kovashka A,Grauman K.Learning a hierarchy of discriminative space-time neighborhood features for human action recognition [C]∥2013 IEEE Conference on Computer Vision and Pattern Recognition.2010:2046-2053
[13] Harel,Jonathan,Koch C,et al.Graph-based visual saliency [C]∥Proceedings of the 20th Annual Conference on in Neural Information Processing Systems.2006:1523-1527
[14] Von Luxburg U.A tutorial on spectral clustering [J].Statistics and Computing,2007,17(4):395-416
[15] Hou X,Zhang L.Saliency detection:a spectral residual approach [C]∥2007 IEEE Conference on Computer Vision and Pattern Recognition.2007:1-8
[16] Guo C,Zhang L.A novel multiresolution spatiotemporal saliencydetection model and its applications in image and video compression [J].IEEE Transactions on Image Processing,2010,19(1):185-198
[17] Itti L,Koch C,Niebur E.A model of saliency-based visual attention for rapid scene analysis [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(11):1254-1259
[18] Lowe D G.Distinctive image features from scale-invariant keypoints [J].International Journal of Computer Vision,2004,60(2):91-110
[19] Efros A A,Berg A C,Mori G,et al.Recognizing action at a distance[C]∥9th IEEE International Conference on Computer Vision.2003:726-733
[20] Cai D,He X,Han J.Document clustering using locality preserving indexing[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(12):1624-1637
[21] Ke Y,Sukthankar R,Hebert M.Spatio-temporal shape and flow correlation for action recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition.2007:1-8
[22] Niebles J C,Wang H,Li Fei-fei.Unsupervised learning of human action categories using spatial-temporal words [J].International Journal of Computer Vision,2008,79(3):299-318
[23] Willems G,Tuytelaars T,Van Gool L.An efficient dense andscale-invariant spatio-temporal interest point detector[M]∥ Computer Vision-ECCV 2008.Springer Berlin Heidelberg,2008:650-663
[24] Guha T,Ward R K.Learning sparse representations for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence 2012,34(8):1576-1588
[25] Baysal S,Duygulu P.A line based pose representation for human action recognition[J].Signal Processing:Image Communication,2013,28(5):458-471

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!