计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 123-128.doi: 10.161896/jsjkx.190800101

• 计算机图形学&多媒体 • 上一篇    下一篇

行为关联网络:完整的变化行为建模

何鑫1, 许娟1,2, 金莹莹1   

  1. 1 南京航空航天大学计算机科学与技术学院 南京211100
    2 东南大学计算机网络与信息集成教育部重点实验室 南京210096
  • 收稿日期:2019-08-22 发布日期:2020-09-10
  • 通讯作者: 许娟(juanxu@nuaa.edu.cn)
  • 作者简介:hexin@nuaa.edu.cn

Action-related Network:Towards Modeling Complete Changeable Action

HE Xin1, XU Juan1,2, JIN Ying-ying1   

  1. 1 College of Computer Since,Technology,Nanjing University of Aeronautics,Astronautics,Nanjing 211100,China
    2 Key Laboratory of Computer Network and Information Integration,Ministry of Education,Southeast University,Nanjing 210096,China
  • Received:2019-08-22 Published:2020-09-10
  • About author:HE Xin,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include deep learning and action recognition.
    XU Juan,born in 1981,associate professor,is a member of China Computer Fe-deration.Her main interests include quantum computing and quantum information,cloud computing and deep lear-ning.

摘要: 针对视频中的完整行为建模,目前常用的方法为时间分段网络(Temporal Segment Network,TSN),但TSN不能充分获取行为的变化信息。为了在时间维度上充分发掘行为的变化信息,文中提出了行为关联网络Action-Related Network(ARN),首先使用BN-Inception网络提取视频中行为的特征,然后将提取到的视频分段特征与Long Short-Term Memory(LSTM)模块输出的特征拼接,最后进行分类。通过以上方法,ARN可以兼顾行为的静态信息和动态信息。实验结果表明,在通用数据集HMDB-51上,ARN的识别准确率为73.33%,比TSN提高了7%;当增加行为信息时,ARN的识别准确率将比TSN提高10%以上。而在行为变化较多的数据集Something-Something V1上,ARN的识别准确率为28.12%,比TSN提高了51%。最后在HMDB-51数据集的一些行为类别上,文中进一步分析了ARN和TSN分别利用更完整的行为信息时识别准确率的变化情况,结果表明ARN的单个类别识别准确率高于TSN 10个百分点以上。由此可见,ARN通过关联行为变化,对完整行为信息进行了更充分的利用,从而有效地提高了变化行为的识别准确率。

关键词: 行为识别, 行为关联网络, 深度学习, 计算机视觉

Abstract: When modeling the complete action in the video,the commonly used method is the temporal segment network (TSN),but TSN cannot fully obtain the action change information.In order to fully explore the change information of action in the time dimension,the Action-Related Network (ARN) is proposed.Firstly,the BN-Inception network is used to extract the features of the action in the video,and then the extracted video segmentation features are combined with the features output by the Long Short-Term Memory (LSTM),and finally classified.With the above approach,ARN can take into account both static and dyna-mic information about the action.Experiments show that on the general data set HMDB-51,the recognition accuracy of ARN is 73.33%,which is 7% higher than the accuracy of TSN.When the action information is increased,the recognition accuracy of ARN will be 10% higher than TSN.On the Something-Something V1 data set with more action changes,the recognition accuracy of ARN is 28.12%,which is 51% higher than the accuracy of TSN.Finally,in some action categories of HMDB-51 dataset,this paper further analyzes the changes of the recognition accuracy of ARN and TSN when using more complete action information res-pectively.The recognition accuracy of ARN is higher than TSN by 10 percentage points.It can be seen that ARN makes full use of the complete action information through the change of the associated action,thereby effectively improving the recognition accuracy of the change action.

Key words: Action recognition, Action-related network, Deep learning, Computer vision

中图分类号: 

  • TP391
[1] SOOMRO K,ZAMIR A R,SHAH M.UCF101:A dataset of101 human actions classes from videos in the wild[J].arXiv:1212.0402,2012.
[2] KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:a large video database for human motion recognition[C]//2011 International Conference on Computer Vision.IEEE,2011:2556-2563.
[3] GOYAL R,KAHOU S E,MICHALSKI V,et al.The Something Something Video Database for Learning and Evaluating Visual Common Sense[C]//ICCV.2017,1(2):3.
[4] KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[5] RUSSAKOVSKY O,DENG J,SUH,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[6] SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Advances in neural information processing systems.2014:568-576.
[7] DU T,BOURDEV L D,FERGUS R,et al.C3d:generic features for video analysis[J].Eprint Arxiv,2014,2(8).
[8] DONAHUE J,ANNE HENDRICKS L,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:2625-2634.
[9] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6299-6308.
[10] QIU Z,YAO T,MEI T.Learning spatio-temporal representation with pseudo-3d residual networks[C]//proceedings of the IEEE International Conference on Computer Vision.2017:5533-5541.
[11] XIE S,SUN C,HUANG J,et al.Rethinking spatiotemporal feature learning:Speed-accuracy trade-offs in video classification[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:305-321.
[12] TRAN D,WANG H,TORRESANI L,et al.A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:6450-6459.
[13] CRASTO N,WEINZAEPFEL P,ALAHARI K,et al.MARS:Motion-Augmented RGB Stream for Action Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:7882-7891.
[14] SUN S,KUANG Z,SHENGL,et al.Optical flow guided feature:A fast and robust motion representation for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1390-1399.
[15] WANG L,XIONG Y,WANG Z,et al.Temporal segment networks:Towards good practices for deep action recognition[C]//European Conference on Computer Vision.Springer,Cham,2016:20-36.
[16] ZHOU B L,ANDONIAN A,OLIVA A,et al.Temporal relational reasoning in videos[C]//Proceedings of the EuropeanConfe-rence on Computer Vision (ECCV).2018:803-818.
[17] ZOLFAGHARI M,SINGH K,BROX T.Eco:Efficient convolutional network for online video understanding[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:695-712.
[18] MA C Y,CHEN M H,KIRA Z,et al.Ts-lstm and temporal-inception:Exploiting spatiotemporal dynamics for activity recognition[J].Signal Processing:Image Communication,2019,71:76-87.
[19] CHEN Q,ZHU X,LING Z,et al.Enhanced lstm for natural language inference[J].arXiv:1609.06038,2016.
[20] GAO L,GUO Z,ZHANG H,et al.Video captioning with attention-based LSTM and semantic consistency[J].IEEE Transactions on Multimedia,2017,19(9):2045-2055.
[21] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[22] IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167,2015.
[23] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[24] HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[1] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[2] 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246.
[3] 仝鑫, 王斌君, 王润正, 潘孝勤. 面向自然语言处理的深度学习对抗样本综述[J]. 计算机科学, 2021, 48(1): 258-267.
[4] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[5] 叶亚男, 迟静, 于志平, 战玉丽, 张彩明. 基于改进CycleGan模型和区域分割的表情动画合成[J]. 计算机科学, 2020, 47(9): 142-149.
[6] 邓良, 许庚林, 李梦杰, 陈章进. 基于深度学习与多哈希相似度加权实现快速人脸识别[J]. 计算机科学, 2020, 47(9): 163-168.
[7] 暴雨轩, 芦天亮, 杜彦辉. 深度伪造视频检测技术综述[J]. 计算机科学, 2020, 47(9): 283-292.
[8] 袁野, 和晓歌, 朱定坤, 王富利, 谢浩然, 汪俊, 魏明强, 郭延文. 视觉图像显著性检测综述[J]. 计算机科学, 2020, 47(7): 84-91.
[9] 王文刀, 王润泽, 魏鑫磊, 漆云亮, 马义德. 基于堆叠式双向LSTM的心电图自动识别算法[J]. 计算机科学, 2020, 47(7): 118-124.
[10] 刘燕, 温静. 基于注意力机制的复杂场景文本检测[J]. 计算机科学, 2020, 47(7): 135-140.
[11] 张志扬, 张凤荔, 谭琪, 王瑞锦. 基于深度学习的信息级联预测方法综述[J]. 计算机科学, 2020, 47(7): 141-153.
[12] 蒋文斌, 符智, 彭晶, 祝简. 一种基于4Bit编码的深度学习梯度压缩算法[J]. 计算机科学, 2020, 47(7): 220-226.
[13] 陈晋音, 张敦杰, 林翔, 徐晓东, 朱子凌. 基于影响力最大化策略的抑制虚假消息传播的方法[J]. 计算机科学, 2020, 47(6A): 17-23.
[14] 程哲, 白茜, 张浩, 王世普, 梁宇. 使用深层卷积神经网络提高Hi-C 数据分辨率[J]. 计算机科学, 2020, 47(6A): 70-74.
[15] 赫磊, 邵展鹏, 张剑华, 周小龙. 基于深度学习的行为识别算法综述[J]. 计算机科学, 2020, 47(6A): 139-147.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[2] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[3] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[4] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[5] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .
[6] 杨羽琦,章国安,金喜龙. 车载自组织网络中基于车辆密度的双簇头路由协议[J]. 计算机科学, 2018, 45(4): 126 -130 .
[7] 张景,朱国宾. 基于CBOW-LDA主题模型的Stack Overflow编程网站热点主题发现研究[J]. 计算机科学, 2018, 45(4): 208 -214 .
[8] 丁舒阳,黎冰,侍洪波. 基于改进的离散PSO算法的FJSP的研究[J]. 计算机科学, 2018, 45(4): 233 -239 .
[9] 李昊阳,符云清. 基于标签聚类与项目主题的协同过滤推荐算法[J]. 计算机科学, 2018, 45(4): 247 -251 .
[10] 王振武,吕小华,韩晓辉. 基于四叉树分割的地形LOD技术综述[J]. 计算机科学, 2018, 45(4): 34 -45 .