计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 139-144.doi: 10.11896/jsjkx.200100094
黄海新, 王瑞鹏, 刘孝阳
HUANG Hai-xin, WANG Rui-peng, LIU Xiao-yang
摘要: 随着经济社会的发展,视频分析任务越来越受到重视。同时,人体行为识别技术已广泛应用于虚拟现实、视频监控、视频检索等领域。传统的人类动作识别方法使用2D卷积处理输入视频,但2D卷积只能提取空间特征,而基于手工提取的方法在复杂环境下又难以处理。因此,在深度学习和图像分类任务取得成功的大背景下,基于深度学习的双流网络以及可以同时提取时空特征的3D卷积应运而生。3D卷积在最近几年迅速发展,衍生出多种经典架构且每种框架拥有不同的特性,各种框架皆存在各自的优化方法以及提高速度和精度的效果。在总结几种主流3D卷积框架的基础上将其在相应数据集上进行对比分析,可以得到每种框架的优势及弊端,以此扬长避短,寻找与实际情景相适应的最优框架。
中图分类号:
[1] AHMED A,YU K,XU W,et al.Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks[C]//ECCV.2008:69-82. [2] BENGIO Y.Learning deep architectures for AI[J].Foun-da-tions and Trends in Machine Learning,2009,2(1):1-127. [3] BROMLEY J,GUYON I,LECUN Y,et al.Signature verification using a siamese time delay neural network[C]//NIPS.1993. [4] JI S W,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2013. [5] DOLL′AR P,RABAUD V,COTTRELL G,et al.Behavior recognition via sparse spatio-temporal features[C]//ICCV VS-PETS.2005:65-72. [6] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//CVPR.2017. [7] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//CVPR.2015. [8] SIMONYAN K,ZISSERMAN A.Very deep convolutionalnetworks for large-scale image recognition[C]//ICLR.2015. [9] XIE S N,SUN C,HUANG J,et al. Rethinking spatiotemporal feature learning[C]//Speed-accuracy trade-offs in video classification. In ECCV,2018:318-335. [10] KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[C]//CVPR.2017. [11] GOYAL R,KAHOU S E,MICHALSKI V,et al.The something something video database for learning and evaluating visual common sense[C]//ICCV.2017. [12] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassifification with deep convolutional neural networks[C]//NIPS.2012. [13] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Effificient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [14] SUN L,JIA K,YEUNG D Y,et al.Human action recognition using factorized spatiotemporal convolutional networks[C]//ICCV.2015. [15] MIECH A,LAPTEV I,SIVIC J.Learnable pooling with context gating for video classifification[J].arXiv:1706.06905,2017 . [16] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:Alarge video database for human motion recognition[C]//ICCV.2011. [17] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//ICCV.2017. [18] TRAN D,RAY J,SHOU Z,et al.Convnet architecture search for spatio-temporal feature learning[J].arXiv:1708.05038,2017. [19] GU C,SUN C,ROSS D A,et al.AVA:A video dataset of spatio-temporally localized atomic visual actions[C]//CVPR.2018. [20] SAHA S,SING G,CUZZOLIN F.AMTnet:Action-micro-tube regression by end-to-end trainable deep architecture[C]//ICCV.2017. [21] HUANG J,RATHOD V,SUN C,et al.Speed/accuracy trade-offs for modern convolutional object detectors[C]//CVPR.2017. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[3] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[4] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[5] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[6] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[7] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[8] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[9] | 杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070 |
[10] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[11] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089 |
[14] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[15] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 |
|