计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 123-128.doi: 10.161896/jsjkx.190800101
何鑫1, 许娟1,2, 金莹莹1
HE Xin1, XU Juan1,2, JIN Ying-ying1
摘要: 针对视频中的完整行为建模,目前常用的方法为时间分段网络(Temporal Segment Network,TSN),但TSN不能充分获取行为的变化信息。为了在时间维度上充分发掘行为的变化信息,文中提出了行为关联网络Action-Related Network(ARN),首先使用BN-Inception网络提取视频中行为的特征,然后将提取到的视频分段特征与Long Short-Term Memory(LSTM)模块输出的特征拼接,最后进行分类。通过以上方法,ARN可以兼顾行为的静态信息和动态信息。实验结果表明,在通用数据集HMDB-51上,ARN的识别准确率为73.33%,比TSN提高了7%;当增加行为信息时,ARN的识别准确率将比TSN提高10%以上。而在行为变化较多的数据集Something-Something V1上,ARN的识别准确率为28.12%,比TSN提高了51%。最后在HMDB-51数据集的一些行为类别上,文中进一步分析了ARN和TSN分别利用更完整的行为信息时识别准确率的变化情况,结果表明ARN的单个类别识别准确率高于TSN 10个百分点以上。由此可见,ARN通过关联行为变化,对完整行为信息进行了更充分的利用,从而有效地提高了变化行为的识别准确率。
中图分类号:
[1] SOOMRO K,ZAMIR A R,SHAH M.UCF101:A dataset of101 human actions classes from videos in the wild[J].arXiv:1212.0402,2012. [2] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:a large video database for human motion recognition[C]//2011 International Conference on Computer Vision.IEEE,2011:2556-2563. [3] GOYAL R,KAHOU S E,MICHALSKI V,et al.The Something Something Video Database for Learning and Evaluating Visual Common Sense[C]//ICCV.2017,1(2):3. [4] KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017. [5] RUSSAKOVSKY O,DENG J,SUH,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252. [6] SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Advances in neural information processing systems.2014:568-576. [7] DU T,BOURDEV L D,FERGUS R,et al.C3d:generic features for video analysis[J].Eprint Arxiv,2014,2(8). [8] DONAHUE J,ANNE HENDRICKS L,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:2625-2634. [9] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308. [10] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541. [11] XIE S,SUN C,HUANG J,et al.Rethinking spatiotemporal feature learning:Speed-accuracy trade-offs in video classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:305-321. [12] TRAN D,WANG H,TORRESANI L,et al.A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:6450-6459. [13] CRASTO N,WEINZAEPFEL P,ALAHARI K,et al.MARS:Motion-Augmented RGB Stream for Action Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:7882-7891. [14] SUN S,KUANG Z,SHENGL,et al.Optical flow guided feature:A fast and robust motion representation for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1390-1399. [15] WANG L,XIONG Y,WANG Z,et al.Temporal segment networks:Towards good practices for deep action recognition[C]//European Conference on Computer Vision.Springer,Cham,2016:20-36. [16] ZHOU B L,ANDONIAN A,OLIVA A,et al.Temporal relational reasoning in videos[C]//Proceedings of the EuropeanConfe-rence on Computer Vision (ECCV).2018:803-818. [17] ZOLFAGHARI M,SINGH K,BROX T.Eco:Efficient convolutional network for online video understanding[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:695-712. [18] MA C Y,CHEN M H,KIRA Z,et al.Ts-lstm and temporal-inception:Exploiting spatiotemporal dynamics for activity recognition[J].Signal Processing:Image Communication,2019,71:76-87. [19] CHEN Q,ZHU X,LING Z,et al.Enhanced lstm for natural language inference[J].arXiv:1609.06038,2016. [20] GAO L,GUO Z,ZHANG H,et al.Video captioning with attention-based LSTM and semantic consistency[J].IEEE Transactions on Multimedia,2017,19(9):2045-2055. [21] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [22] IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167,2015. [23] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [24] HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[4] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[8] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[9] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[10] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[11] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[12] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[13] | 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋. 改进Faster R-CNN的光学遥感飞机目标检测 Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN 计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121 |
[14] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
[15] | 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210 |
|