时空域深度卷积神经网络及其在行为识别上的应用

doi:10.11896/j.issn.1002-137X.2015.07.052

摘要/Abstract

摘要： 近年来深度卷积神经网络在静态图像识别上取得了较大进展,但在行为视频上建模运动信息的能力较弱。但是,运动信息是行为识别区别于静态图像识别的关键。基于滤波器响应积提出了时空域深度卷积神经网络。该网络先将相邻帧对应的卷积核分为两组,近似地形成傅里叶基函数对,后续的乘法层将不同帧产生的响应两两相乘后再输入加法层求和,从而将相邻帧映射到变换矩阵的特征值对应的不变子空间上,依靠相邻帧在不变子空间上的旋转角度检测它们之间的运动特征。理论分析证明,网络既对运动敏感,又对内容敏感。实验表明,该网络能对行为视频做出更准确的分类,并与近年出现的其他6种算法进行比较,结果体现了本算法的优越性。

关键词: 时空域,卷积神经网络,深度学习,动作特征,行为识别

Abstract: The key thing that distinguishes action recognition from other recognition tasks is to encode motion explicitly.But,so far,most works based on convolutional neural networks (CNN) cannot properly handle the spatiotemporal interaction in video.We developed a spatiotemporal-CNN that explicitly exploits this important cue provided by video.Instead of summing filter responses,responses are multiplied and our approach is based on that.Specifically,the spatiotemporal-CNN divides convolutional kernels into two groups forming sinusoidals of Fourier Transform.Then,the responses of convolutional kernels are multiplied by multiplicative layer as calculating covariance and the outputs are put into sum layer.In this way,the inputs and adjacent frames are mapped into the subspaces spanned by the eigenvectors,and the special geometric transformations or motion features can be checked by the rotating angles in that space.Additional theoretical analysis proves that spatiotemporal-CNN is sensitive to both motion and content.The experiment shows that our approach produces more accurate classification than current algorithms.

Key words: Spatiotemporal,Convolutional neural networks,Deep learning,Motion feature,Action recognition

刘琮许维胜吴启迪. 时空域深度卷积神经网络及其在行为识别上的应用[J]. 计算机科学, 2015, 42(7): 245-249. https://doi.org/10.11896/j.issn.1002-137X.2015.07.052

LIU Cong XU Wei-sheng WU Qi-di. Spatiotemporal Convolutional Neural Networks and its Application in Action Recognition[J]. Computer Science, 2015, 42(7): 245-249. https://doi.org/10.11896/j.issn.1002-137X.2015.07.052

参考文献

[1] 胡琼,秦磊,黄庆明.基于视觉的人体动作识别综述[J].计算机学报,2013,36(12):2512-2524 Hu Qiong,Qin Lei,Huang Qing-ming.A survey on visual human action recognition[J].Chinese Journal of Computers,2013,36(12):2512-2524
[2] 孔邵颖,郭宏亮.基于可伸缩语义网络距离的Web多维信息识别算法[J].科技通报,2013,9(4):33-35 Kong Shao-ying,Guo Hong-liang.Web multi-dimensional information identification algorithm based on the stretch of the distance from the semantic network[J].Bulletin of Science and Technology,2013,9(4):33-35
[3] Bengio Y,Courville A,Vincent P.Representation Learning:AReview and New Perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828
[4] Zhao Zeng-shun,Zhang Li,Zhao Meng,et al.Gabor face recognition by multi-channel classifier fusion of supervised kernel manifold learning[J].Neuro-computing,2012,97:398-404
[5] 范瑞娟,王倩,罗强.改进DV-HOP输电线路上的WSN节点定位[J].计算机仿真,2013,0(9):131-134 Fan Rui-juan,Wang Qian,Luo Qiang.Improved DV-HOP algorithm to WSN node localization on transmission line[J].Computer Simulation,2013,0(9):131-134
[6] 郑胤,陈权崎,章毓晋.深度学习及其在目标和行为识别中的新进展[J].中国图像图形学报,2014,19(2):175-184 Zheng Yin,Chen Quan-qi,Zhang Yu-jin.Deep learning and its new progress in object and behavior recognition[J].Journal of Image and Graphics,2014,19(2):175-184
[7] 朱旭东,刘志镜.基于主题隐马尔科夫模型的人体异常行为识别[J].计算机科学,2012,39(3):251-255 Zhu Xu-dong,Liu Zhi-jing.Human abnormal behavior recognition based on topic hidden markov model[J].Computer Science,2012,39(3):251-255
[8] Ji S,Xu W,Yang M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231
[9] Mobahi,Hossein,Collobert R,et al.Deep learning from temporal coherence in video[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:737-744
[10] Karpathy,Andrej,et al.Large-scale video classification withconvolutional neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2014
[11] Memisevic R.Learning to relate images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1829-1846
[12] Pollock D S G.Circulant matrices and time-series analysis[J].International Journal of Mathematical Education in Science and Technology,2002,33(2):213-230
[13] Taylor G W,Fergus R,Lecun Y,et al.Convolutional learning of spatio-temporal features[M]∥Computer Vision--ECCV 2010.Springer,2010:140-153
[14] Le Q V,Zou W Y,Yeung S Y,et al.Learning hierarchical inva-riant spatio-temporal features for action recognition with independent subspace analysis[C]∥2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2011:3361-3368
[15] Bregonzio M,Xiang T,Gong S.Fusing appearance and distribution information of interest points for action recognition[J].Pattern Recognition,2012,45(3):1220-1234

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed