复杂场景下的人体行为识别研究新进展

doi:10.11896/j.issn.1002-137X.2014.12.001

摘要/Abstract

摘要： 人体行为识别是计算机视觉的研究难点和热点,主流的研究框架包括行为特征提取、人体行为表示和识别算法3个方面,目前简单场景下的人体简单动作的识别已基本得到解决,而复杂场景下的行为识别仍面临很多困难。对近几年人体行为识别的发展做了比较详细的研究,从人体行为识别的研究范畴、特征提取以及行为模型等方面综述了目前复杂场景下人体行为识别的研究方法。与已有的相关综述文献不同的是,文中结合了近三年国内外人体行为识别领域中新的研究热点和成果,如姿态特征的提取和表示、基于稀疏编码和卷积神经网络的人体行为表示方法等。最后阐述了该领域目前存在的困难以及可能的发展趋向。

关键词: 人体行为识别,行为特征提取,行为表示,计算机视觉

Abstract: Human action recognition has become a hot and difficult spot currently in the domain of computer vision.The framework of mainstream methods includes visual feature detection,action representation and action classification.Action recognition in simple scenes has been implemented at present.This paper introduced in detail the research of human action recognition in realistic scenes from perspectives of research scope,feature detection,and action modeling.Unlike several recent published researches,we analyzed the state-of-the-arts and advances of this field,such as pose estimation,sparse coding based or deep learning based human action representation etc.Finally,the problems,difficulties as well as possible solutions were discussed.

Key words: Human action recognition,Visual feature detection,Action representation,Computer vision

雷庆,陈锻生,李绍滋. 复杂场景下的人体行为识别研究新进展[J]. 计算机科学, 2014, 41(12): 1-7. https://doi.org/10.11896/j.issn.1002-137X.2014.12.001

LEI Qing,CHEN Duan-sheng and LI Shao-zi. Advances on Human Action Recognition in Realistic Scenes[J]. Computer Science, 2014, 41(12): 1-7. https://doi.org/10.11896/j.issn.1002-137X.2014.12.001

参考文献

[1] 徐光祐,曹媛媛.动作识别与行为理解综述[J].中国图象图形学报,2009,14(2):189-195
[2] 黎洪松,李达.人体运动分析研究的若干新进展[J].模式识别与人工能,2009,22(1):70-78
[3] Yamato J,Ohya J,Ishii K.Recognizing human action in time-sequential images using hidden Markov model[C]∥Proceedings of the Conference on Computer Vision and Pattern Recognition.1992:379-385
[4] Bobick A F,Davis J W.The recognition of human movement using temporal templates[J].IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),2001,3(3):257-267
[5] Blank M,Gorelick L,Shechtman E,et al.Actions as space-time shapes[C]∥Proceedings of the International Conference On Computer Vision (ICCV’05).2005:1395-1402
[6] Polana R,Nelson R C.Detection and recognition of periodic,nonrigid motion[J].International Journal of Computer Vision (IJCV),1997,23(3):261-282
[7] Efros A A,Berg A C,Mori G,et al.Recognizing action at a distance[C]∥ Proceedings of the International Conference on Computer Vision (ICCV’03).2003:726-733
[8] Dalal N,Triggs B.Histograms of oriented gradients for human detection[C]∥Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’05).2005,1:886-893
[9] Wang Yang,Mori G.Learning a discriminative hidden part model for human action recognition[C]∥Advances in Neural Information Processing Systems (NIPS).2008,1:1721-1728
[10] Laptev I,Marszaek M,Cordelia Schmid,et al.Learning realistic human actions from movies[C]∥Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’08).2008:1-8
[11] Johansson G.Visual Perception of Biological Motion and a Model for its Analysis[J].Perception and Psychophysics,1973,14(2):210-211
[12] Felzenszwalb P F,Girshick R B,McAllester D.Cascade ObjectDetection with Deformable Part Models[C]∥Computer Vision and Pattern Recognition (CVPR).2010:2241-2248
[13] Yao A,Gall J,Gool L V.Coupled Action Recognition and Pose Estimation from Multiple Views[J].International Journal of Computer Vision (IJCV),2012,0(1):16-37
[14] Yao Bang-peng,Li Fei-fei.Modeling mutual context of objectand human pose in human-object interaction activities[J].IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),2012,34(9):1691-1703
[15] Packer B,Saenko K,Koller D.A combined pose,object,and feature model for action understanding[C]∥Computer Vision and Pattern Recognition (CVPR).2012:1378-1385
[16] Yao A,Gall J,Fanelli G,et al.Does Human Action Recognition Benefit from Pose Estimation?[C]∥Proceedings of the British Machine Vision Conference.BMVA Press,2011:1-11
[17] Laptev I,Caputo B,Schuldt C,et al.Local velocity-adapted motion events for spatio-temporal recognition [J].Computer Vision and Image Understanding (CVIU),2007,108(3):207-229
[18] Laptev I,Lindeberg T.Space-time interest points[C]∥Procee-dings of the International Conference on Computer Vision (ICCV’03).Nice,France,2003,1:432-439
[19] Dollar P,Rabaud V,Cottrell G,et al.Behavior recognition viasparse spatio-temporal features[C]∥Proceedings of the International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.2005:65-72
[20] Scovanner P,Ali S,Shah M.A 3-dimensional SIFT descriptor and its application to action recognition[C]∥Proceedings of the International Conference on Multimedia (MultiMedia’07).Augsburg,Germany,2007:357-360
[21] Oikonomopoulos A,Patras I,Pantic M.Spatio-temporal salient points for visual recognition of human actions[J].IEEE Tran-sactions on Systems Man And Cybernetics (SMC),2006,6(3):710-719
[22] Willems G,Tuytelaars T,Van Gool L J.An efficient dense and scaleinvariant spatio-temporal interest point detector[C]∥Proceedings of the European Conference on Computer Vision (ECCV’08).2008:650-663
[23] Sun Ju,Wu Xiao,Yan Shui-cheng,et al.Hierarchical spatio-temporal context modeling for action recognition[C]∥Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’2009).2009:1-8
[24] Gupta A,Kembhavi A,Davis L S.Observing human-object interactions:using spatial and functional compatibility for recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),2009,31(10):1775-1789
[25] Candes E J,Wakin M B.An introduction to compressive sam-pling[J].IEEE Signal Processing Magzine,2008,5(2):21-30
[26] Wright J,Ma Y,Mairal J,et al.Sparse Representation for Computer Vision and Pattern Recognition[J].Proceeding of the IEEE,2010,98(6):1031-1044
[27] Davenport M A,Duarte M F,Eldar Y C,et al.Introduction to compressed sensing.2011.http://www.dfg-spp1324.de/download/preprints/preprint093.pdf
[28] 焦李成,杨淑媛,刘芳,等.压缩感知回顾与展望[J].电子学报,2010,39(7):1651-1662
[29] Wright J,Yang A Y,Ganesh A,et al.Robust face recognition via sparse representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,1(2):210-227
[30] Guha T,Ward R K.Learning Sparse Representations for Hu-man Action Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,4(8):1576-1588
[31] Castrodad A,Sapiro G,Castrodad A,et al.Sparse Modeling of Human Actions from Motion Imagery[J].International Journal of Computer Vision,2012,0(1):1-15
[32] Bengio Y.Learning Deep Architectures for AI[J].Foundations and Trends in Machine Learning,2009,2(1):1-127
[33] Ji Shui-wang,Xu Wei,Yang Ming,et al.3D Convolutional Neural Networks for Human Action Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,5(1):221-231
[34] Le Q V,Zou W Y,Yeung S Y,et al.Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis[C]∥Computer Vision and Pattern Recognition (CVPR).2011:3361-3368
[35] Farabet C,Couprie C,Najman L,et al.Learning HierarchicalFeatures for Scene Labeling[J].IEEE Transactions on Pattern Analysis and Machine Intelligence.Preprints,2013,35(8):1915-1929
[36] Rodriguez M,Ahmed J,Shah M.Action MACH:A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Anchorage,Alaska,UCF Sports,2008:1-8
[37] Liu Jin-gen,Luo Jie-bo,Shah M.Recognizing Realistic Actions from Videos "in the Wild"[J].IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).Miami,2009
[38] Marzalek M,Laptev I,Schmid C.Actions in context[C]∥CVPR.2009:2929-2936
[39] Gilbert A,Illingworth J,Bowden R.Action Recognition UsingMined Hierarchical Compound Features[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(5):883-897
[40] Roshtkhari M J,Levine M D.A Multi-Scale Hierarchical Codebook Method for Human Action Recognition in Videos Using a Single Example[C]∥Proc.of the conference on computer and robot vision (CRV).2012:182-189
[41] Yao Bang-peng,Li Fei-fei.Recognizing Human-Object Interac-tions in Still Images by Modeling the Mutual Context of Objects and Human Poses[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,4(9):1691-1703
[42] Desai C,Ramanan D.Detecting Actions,Poses,and Objects with Relational Phraselets[C]∥European Conference on Computer Vision.2012:158-172
[43] Shotton J,Fitzgibbon A W,Cook M,et al.Real-time human pose recognition in parts from single depth images[J].Machine Learning for Computer Vision,2013,1:193-135
[44] Wang Jiang,Liu Zi-cheng,Wu Ying,et al.Mining actionlet ensemble for action recognition with depth cameras[R].Microsoft Research,2012
[45] Turaga P,Veeraraghavan A,Chellappa R.Unsupervised viewand rate invariant clustering of video sequences[J].Computer Vision and Image Understanding (CVIU),2009,3(3):353-371
[46] Rodriguez M D,Ahmed J,Shah M.Action MACH:a spatio-temporal maximum average correlation height filter for action recognition[C]∥Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’08).Anchorage,2008:1-8
[47] Brand M.Coupled hidden Markov models for modeling interacting processes[J].Daa,1997
[48] Nguyen N T,Phung D Q,Venkatesh S,et al.Learning and de-tecting activities from movement trajectories using the hierarchical hidden Markov models[C]∥IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2005,2:955-960
[49] Park S,Aggarwal J K.A hierarchical Bayesian network for event recognition of human actions and interactions[J].Multimedia Systems,2004,10(2):164-179
[50] Muncaster J,Ma Y.Activity recognition using dynamic Bayesian networks with automatic state selection[C]∥IEEE Workshop on Motion and Video Computing (WMVC).2007:30-37

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed