计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 123-128.doi: 10.161896/jsjkx.190800101

• 计算机图形学&多媒体 • 上一篇    下一篇

行为关联网络:完整的变化行为建模

何鑫1, 许娟1,2, 金莹莹1   

  1. 1 南京航空航天大学计算机科学与技术学院 南京211100
    2 东南大学计算机网络与信息集成教育部重点实验室 南京210096
  • 收稿日期:2019-08-22 发布日期:2020-09-10
  • 通讯作者: 许娟(juanxu@nuaa.edu.cn)
  • 作者简介:hexin@nuaa.edu.cn

Action-related Network:Towards Modeling Complete Changeable Action

HE Xin1, XU Juan1,2, JIN Ying-ying1   

  1. 1 College of Computer Since,Technology,Nanjing University of Aeronautics,Astronautics,Nanjing 211100,China
    2 Key Laboratory of Computer Network and Information Integration,Ministry of Education,Southeast University,Nanjing 210096,China
  • Received:2019-08-22 Published:2020-09-10
  • About author:HE Xin,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include deep learning and action recognition.
    XU Juan,born in 1981,associate professor,is a member of China Computer Fe-deration.Her main interests include quantum computing and quantum information,cloud computing and deep lear-ning.

摘要: 针对视频中的完整行为建模,目前常用的方法为时间分段网络(Temporal Segment Network,TSN),但TSN不能充分获取行为的变化信息。为了在时间维度上充分发掘行为的变化信息,文中提出了行为关联网络Action-Related Network(ARN),首先使用BN-Inception网络提取视频中行为的特征,然后将提取到的视频分段特征与Long Short-Term Memory(LSTM)模块输出的特征拼接,最后进行分类。通过以上方法,ARN可以兼顾行为的静态信息和动态信息。实验结果表明,在通用数据集HMDB-51上,ARN的识别准确率为73.33%,比TSN提高了7%;当增加行为信息时,ARN的识别准确率将比TSN提高10%以上。而在行为变化较多的数据集Something-Something V1上,ARN的识别准确率为28.12%,比TSN提高了51%。最后在HMDB-51数据集的一些行为类别上,文中进一步分析了ARN和TSN分别利用更完整的行为信息时识别准确率的变化情况,结果表明ARN的单个类别识别准确率高于TSN 10个百分点以上。由此可见,ARN通过关联行为变化,对完整行为信息进行了更充分的利用,从而有效地提高了变化行为的识别准确率。

关键词: 计算机视觉, 深度学习, 行为关联网络, 行为识别

Abstract: When modeling the complete action in the video,the commonly used method is the temporal segment network (TSN),but TSN cannot fully obtain the action change information.In order to fully explore the change information of action in the time dimension,the Action-Related Network (ARN) is proposed.Firstly,the BN-Inception network is used to extract the features of the action in the video,and then the extracted video segmentation features are combined with the features output by the Long Short-Term Memory (LSTM),and finally classified.With the above approach,ARN can take into account both static and dyna-mic information about the action.Experiments show that on the general data set HMDB-51,the recognition accuracy of ARN is 73.33%,which is 7% higher than the accuracy of TSN.When the action information is increased,the recognition accuracy of ARN will be 10% higher than TSN.On the Something-Something V1 data set with more action changes,the recognition accuracy of ARN is 28.12%,which is 51% higher than the accuracy of TSN.Finally,in some action categories of HMDB-51 dataset,this paper further analyzes the changes of the recognition accuracy of ARN and TSN when using more complete action information res-pectively.The recognition accuracy of ARN is higher than TSN by 10 percentage points.It can be seen that ARN makes full use of the complete action information through the change of the associated action,thereby effectively improving the recognition accuracy of the change action.

Key words: Action recognition, Action-related network, Computer vision, Deep learning

中图分类号: 

  • TP391
[1] SOOMRO K,ZAMIR A R,SHAH M.UCF101:A dataset of101 human actions classes from videos in the wild[J].arXiv:1212.0402,2012.
[2] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:a large video database for human motion recognition[C]//2011 International Conference on Computer Vision.IEEE,2011:2556-2563.
[3] GOYAL R,KAHOU S E,MICHALSKI V,et al.The Something Something Video Database for Learning and Evaluating Visual Common Sense[C]//ICCV.2017,1(2):3.
[4] KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[5] RUSSAKOVSKY O,DENG J,SUH,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[6] SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Advances in neural information processing systems.2014:568-576.
[7] DU T,BOURDEV L D,FERGUS R,et al.C3d:generic features for video analysis[J].Eprint Arxiv,2014,2(8).
[8] DONAHUE J,ANNE HENDRICKS L,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:2625-2634.
[9] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[10] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541.
[11] XIE S,SUN C,HUANG J,et al.Rethinking spatiotemporal feature learning:Speed-accuracy trade-offs in video classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:305-321.
[12] TRAN D,WANG H,TORRESANI L,et al.A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:6450-6459.
[13] CRASTO N,WEINZAEPFEL P,ALAHARI K,et al.MARS:Motion-Augmented RGB Stream for Action Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:7882-7891.
[14] SUN S,KUANG Z,SHENGL,et al.Optical flow guided feature:A fast and robust motion representation for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1390-1399.
[15] WANG L,XIONG Y,WANG Z,et al.Temporal segment networks:Towards good practices for deep action recognition[C]//European Conference on Computer Vision.Springer,Cham,2016:20-36.
[16] ZHOU B L,ANDONIAN A,OLIVA A,et al.Temporal relational reasoning in videos[C]//Proceedings of the EuropeanConfe-rence on Computer Vision (ECCV).2018:803-818.
[17] ZOLFAGHARI M,SINGH K,BROX T.Eco:Efficient convolutional network for online video understanding[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:695-712.
[18] MA C Y,CHEN M H,KIRA Z,et al.Ts-lstm and temporal-inception:Exploiting spatiotemporal dynamics for activity recognition[J].Signal Processing:Image Communication,2019,71:76-87.
[19] CHEN Q,ZHU X,LING Z,et al.Enhanced lstm for natural language inference[J].arXiv:1609.06038,2016.
[20] GAO L,GUO Z,ZHANG H,et al.Video captioning with attention-based LSTM and semantic consistency[J].IEEE Transactions on Multimedia,2017,19(9):2045-2055.
[21] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[22] IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167,2015.
[23] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[24] HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[9] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[12] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[13] 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋.
改进Faster R-CNN的光学遥感飞机目标检测
Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN
计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121
[14] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[15] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!