基于3D卷积的人体行为识别技术综述

doi:10.11896/jsjkx.200100094

Abstract

Abstract: With the development of economy and society,tasks of video analysis are getting more and more attention.Meanwhile,human action recognition technology has been widely used in virtual reality,video surveillance,video retrieval,etc.Traditional human action recognition method is to use 2D convolution to process the input video,but 2D convolution can only extract the spatial features.However,the recognition based on manual extraction in complex environments is difficult to handle.Therefore,in the context of the success of deep learning and image classification tasks,a dual-flow network based on deep learning and a 3D convolution that can simultaneously extract temporal and spatial features emerges.3D convolution has developed rapidly in recent years,and has derived a variety of classic architectures,each with different characteristics.Each framework has its own optimization method and the effect of improving speed and accuracy.Based on the summary of several mainstream 3D convolutional frameworks and putting them into corresponding data sets for comparison and analysis,the advantages and disadvantages of each framework can be obtained accordingly,so as to find the optimal framework that is suitable for the actual situation.

Key words: 3D convolution, Classification, Feature extraction, Human action recognition, Video analysis

CLC Number:

TP391

HUANG Hai-xin, WANG Rui-peng, LIU Xiao-yang. Review of Human Action Recognition Technology Based on 3D Convolution[J].Computer Science, 2020, 47(11A): 139-144.

References

[1] AHMED A,YU K,XU W,et al.Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks[C]//ECCV.2008:69-82.
[2] BENGIO Y.Learning deep architectures for AI[J].Foun-da-tions and Trends in Machine Learning,2009,2(1):1-127.
[3] BROMLEY J,GUYON I,LECUN Y,et al.Signature verification using a siamese time delay neural network[C]//NIPS.1993.
[4] JI S W,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2013.
[5] DOLL′AR P,RABAUD V,COTTRELL G,et al.Behavior recognition via sparse spatio-temporal features[C]//ICCV VS-PETS.2005:65-72.
[6] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//CVPR.2017.
[7] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//CVPR.2015.
[8] SIMONYAN K,ZISSERMAN A.Very deep convolutionalnetworks for large-scale image recognition[C]//ICLR.2015.
[9] XIE S N,SUN C,HUANG J,et al. Rethinking spatiotemporal feature learning[C]//Speed-accuracy trade-offs in video classification. In ECCV,2018:318-335.
[10] KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[C]//CVPR.2017.
[11] GOYAL R,KAHOU S E,MICHALSKI V,et al.The something something video database for learning and evaluating visual common sense[C]//ICCV.2017.
[12] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassifification with deep convolutional neural networks[C]//NIPS.2012.
[13] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Effificient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[14] SUN L,JIA K,YEUNG D Y,et al.Human action recognition using factorized spatiotemporal convolutional networks[C]//ICCV.2015.
[15] MIECH A,LAPTEV I,SIVIC J.Learnable pooling with context gating for video classifification[J].arXiv:1706.06905,2017 .
[16] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:Alarge video database for human motion recognition[C]//ICCV.2011.
[17] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//ICCV.2017.
[18] TRAN D,RAY J,SHOU Z,et al.Convnet architecture search for spatio-temporal feature learning[J].arXiv:1708.05038,2017.
[19] GU C,SUN C,ROSS D A,et al.AVA:A video dataset of spatio-temporally localized atomic visual actions[C]//CVPR.2018.
[20] SAHA S,SING G,CUZZOLIN F.AMTnet:Action-micro-tube regression by end-to-end trainable deep architecture[C]//ICCV.2017.
[21] HUANG J,RATHOD V,SUN C,et al.Speed/accuracy trade-offs for modern convolutional object detectors[C]//CVPR.2017.

Related Articles 15

[1]	CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2]	ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[4]	TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[5]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[6]	WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[7]	ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39.
[8]	GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[9]	YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[10]	ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[11]	ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[12]	CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[13]	SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[14]	CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[15]	YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Review of Human Action Recognition Technology Based on 3D Convolution

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0