计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 139-144.doi: 10.11896/jsjkx.200100094

• 计算机图形学&多媒体 • 上一篇    下一篇

基于3D卷积的人体行为识别技术综述

黄海新, 王瑞鹏, 刘孝阳   

  1. 沈阳理工大学自动化与电气工程学院 沈阳 210100
  • 出版日期:2020-11-15 发布日期:2020-11-17
  • 通讯作者: 黄海新(huanghaixin@sylu.edu.cn)

Review of Human Action Recognition Technology Based on 3D Convolution

HUANG Hai-xin, WANG Rui-peng, LIU Xiao-yang   

  1. School of Automation and Electrical Engineering,Shenyang Ligong University,Shenyang 210100,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:HUANG Hai-xin,born in 1973,Ph.D,associate professor.Her main research interests include machine learning,artificial intelligence and intelligent grid.

摘要: 随着经济社会的发展,视频分析任务越来越受到重视。同时,人体行为识别技术已广泛应用于虚拟现实、视频监控、视频检索等领域。传统的人类动作识别方法使用2D卷积处理输入视频,但2D卷积只能提取空间特征,而基于手工提取的方法在复杂环境下又难以处理。因此,在深度学习和图像分类任务取得成功的大背景下,基于深度学习的双流网络以及可以同时提取时空特征的3D卷积应运而生。3D卷积在最近几年迅速发展,衍生出多种经典架构且每种框架拥有不同的特性,各种框架皆存在各自的优化方法以及提高速度和精度的效果。在总结几种主流3D卷积框架的基础上将其在相应数据集上进行对比分析,可以得到每种框架的优势及弊端,以此扬长避短,寻找与实际情景相适应的最优框架。

关键词: 3D卷积, 分类, 人体行为识别, 视频分析, 特征提取

Abstract: With the development of economy and society,tasks of video analysis are getting more and more attention.Meanwhile,human action recognition technology has been widely used in virtual reality,video surveillance,video retrieval,etc.Traditional human action recognition method is to use 2D convolution to process the input video,but 2D convolution can only extract the spatial features.However,the recognition based on manual extraction in complex environments is difficult to handle.Therefore,in the context of the success of deep learning and image classification tasks,a dual-flow network based on deep learning and a 3D convolution that can simultaneously extract temporal and spatial features emerges.3D convolution has developed rapidly in recent years,and has derived a variety of classic architectures,each with different characteristics.Each framework has its own optimization method and the effect of improving speed and accuracy.Based on the summary of several mainstream 3D convolutional frameworks and putting them into corresponding data sets for comparison and analysis,the advantages and disadvantages of each framework can be obtained accordingly,so as to find the optimal framework that is suitable for the actual situation.

Key words: 3D convolution, Classification, Feature extraction, Human action recognition, Video analysis

中图分类号: 

  • TP391
[1] AHMED A,YU K,XU W,et al.Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks[C]//ECCV.2008:69-82.
[2] BENGIO Y.Learning deep architectures for AI[J].Foun-da-tions and Trends in Machine Learning,2009,2(1):1-127.
[3] BROMLEY J,GUYON I,LECUN Y,et al.Signature verification using a siamese time delay neural network[C]//NIPS.1993.
[4] JI S W,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2013.
[5] DOLL′AR P,RABAUD V,COTTRELL G,et al.Behavior recognition via sparse spatio-temporal features[C]//ICCV VS-PETS.2005:65-72.
[6] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//CVPR.2017.
[7] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//CVPR.2015.
[8] SIMONYAN K,ZISSERMAN A.Very deep convolutionalnetworks for large-scale image recognition[C]//ICLR.2015.
[9] XIE S N,SUN C,HUANG J,et al. Rethinking spatiotemporal feature learning[C]//Speed-accuracy trade-offs in video classification. In ECCV,2018:318-335.
[10] KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[C]//CVPR.2017.
[11] GOYAL R,KAHOU S E,MICHALSKI V,et al.The something something video database for learning and evaluating visual common sense[C]//ICCV.2017.
[12] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassifification with deep convolutional neural networks[C]//NIPS.2012.
[13] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Effificient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[14] SUN L,JIA K,YEUNG D Y,et al.Human action recognition using factorized spatiotemporal convolutional networks[C]//ICCV.2015.
[15] MIECH A,LAPTEV I,SIVIC J.Learnable pooling with context gating for video classifification[J].arXiv:1706.06905,2017 .
[16] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:Alarge video database for human motion recognition[C]//ICCV.2011.
[17] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//ICCV.2017.
[18] TRAN D,RAY J,SHOU Z,et al.Convnet architecture search for spatio-temporal feature learning[J].arXiv:1708.05038,2017.
[19] GU C,SUN C,ROSS D A,et al.AVA:A video dataset of spatio-temporally localized atomic visual actions[C]//CVPR.2018.
[20] SAHA S,SING G,CUZZOLIN F.AMTnet:Action-micro-tube regression by end-to-end trainable deep architecture[C]//ICCV.2017.
[21] HUANG J,RATHOD V,SUN C,et al.Speed/accuracy trade-offs for modern convolutional object detectors[C]//CVPR.2017.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[4] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[5] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[6] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[7] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[8] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[9] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[10] 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥.
视频理解中的动作质量评估方法综述
Survey on Action Quality Assessment Methods in Video Understanding
计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028
[11] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[14] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[15] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!