Computer Science ›› 2015, Vol. 42 ›› Issue (7): 245-249.doi: 10.11896/j.issn.1002-137X.2015.07.052

Previous Articles     Next Articles

Spatiotemporal Convolutional Neural Networks and its Application in Action Recognition

LIU Cong XU Wei-sheng WU Qi-di   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The key thing that distinguishes action recognition from other recognition tasks is to encode motion explicitly.But,so far,most works based on convolutional neural networks (CNN) cannot properly handle the spatiotemporal interaction in video.We developed a spatiotemporal-CNN that explicitly exploits this important cue provided by video.Instead of summing filter responses,responses are multiplied and our approach is based on that.Specifically,the spatiotemporal-CNN divides convolutional kernels into two groups forming sinusoidals of Fourier Transform.Then,the responses of convolutional kernels are multiplied by multiplicative layer as calculating covariance and the outputs are put into sum layer.In this way,the inputs and adjacent frames are mapped into the subspaces spanned by the eigenvectors,and the special geometric transformations or motion features can be checked by the rotating angles in that space.Additional theoretical analysis proves that spatiotemporal-CNN is sensitive to both motion and content.The experiment shows that our approach produces more accurate classification than current algorithms.

Key words: Spatiotemporal,Convolutional neural networks,Deep learning,Motion feature,Action recognition

[1] 胡琼,秦磊,黄庆明.基于视觉的人体动作识别综述[J].计算机学报,2013,36(12):2512-2524 Hu Qiong,Qin Lei,Huang Qing-ming.A survey on visual human action recognition[J].Chinese Journal of Computers,2013,36(12):2512-2524
[2] 孔邵颖,郭宏亮.基于可伸缩语义网络距离的Web多维信息识别算法[J].科技通报,2013,9(4):33-35 Kong Shao-ying,Guo Hong-liang.Web multi-dimensional information identification algorithm based on the stretch of the distance from the semantic network[J].Bulletin of Science and Technology,2013,9(4):33-35
[3] Bengio Y,Courville A,Vincent P.Representation Learning:AReview and New Perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828
[4] Zhao Zeng-shun,Zhang Li,Zhao Meng,et al.Gabor face recognition by multi-channel classifier fusion of supervised kernel manifold learning[J].Neuro-computing,2012,97:398-404
[5] 范瑞娟,王倩,罗强.改进DV-HOP输电线路上的WSN节点定位[J].计算机仿真,2013,0(9):131-134 Fan Rui-juan,Wang Qian,Luo Qiang.Improved DV-HOP algorithm to WSN node localization on transmission line[J].Computer Simulation,2013,0(9):131-134
[6] 郑胤,陈权崎,章毓晋.深度学习及其在目标和行为识别中的新进展[J].中国图像图形学报,2014,19(2):175-184 Zheng Yin,Chen Quan-qi,Zhang Yu-jin.Deep learning and its new progress in object and behavior recognition[J].Journal of Image and Graphics,2014,19(2):175-184
[7] 朱旭东,刘志镜.基于主题隐马尔科夫模型的人体异常行为识别[J].计算机科学,2012,39(3):251-255 Zhu Xu-dong,Liu Zhi-jing.Human abnormal behavior recognition based on topic hidden markov model[J].Computer Science,2012,39(3):251-255
[8] Ji S,Xu W,Yang M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231
[9] Mobahi,Hossein,Collobert R,et al.Deep learning from temporal coherence in video[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:737-744
[10] Karpathy,Andrej,et al.Large-scale video classification withconvolutional neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2014
[11] Memisevic R.Learning to relate images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1829-1846
[12] Pollock D S G.Circulant matrices and time-series analysis[J].International Journal of Mathematical Education in Science and Technology,2002,33(2):213-230
[13] Taylor G W,Fergus R,Lecun Y,et al.Convolutional learning of spatio-temporal features[M]∥Computer Vision--ECCV 2010.Springer,2010:140-153
[14] Le Q V,Zou W Y,Yeung S Y,et al.Learning hierarchical inva-riant spatio-temporal features for action recognition with independent subspace analysis[C]∥2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2011:3361-3368
[15] Bregonzio M,Xiang T,Gong S.Fusing appearance and distribution information of interest points for action recognition[J].Pattern Recognition,2012,45(3):1220-1234

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!