基于骨架模态的多级门控图卷积动作识别网络

doi:10.11896/jsjkx.201100164

计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 181-186.doi: 10.11896/jsjkx.201100164

• 计算机图形学&多媒体 • 上一篇下一篇

基于骨架模态的多级门控图卷积动作识别网络

干创¹, 吴桂兴^1,2, 詹庆原¹, 王鹏焜¹, 彭志磊¹

1 中国科学技术大学软件学院江苏苏州215000
2 中国科学技术大学苏州高等研究院江苏苏州215000

收稿日期:2020-11-24 修回日期:2021-04-06 出版日期:2022-01-15 发布日期:2022-01-18
通讯作者: 吴桂兴(gxwu@ustc.edu.cn)
作者简介:chelgan@mail.ustc.edu.cn
基金资助:
江苏省自然科学基金(BK20141209)

Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition

GAN Chuang¹, WU Gui-xing^1,2, ZHAN Qing-yuan¹, WANG Peng-kun¹, PENG Zhi-lei¹

1 School of Software Engineering,University of Science and Technology of China,Suzhou,Jiangsu 215000,China
2 Suzhou Research Institute,University of Science and Technology of China,Suzhou,Jiangsu 215000,China

Received:2020-11-24 Revised:2021-04-06 Online:2022-01-15 Published:2022-01-18
About author:GAN Chuang,born in 1996,postgra-duate.His main research interests include action recognition and spatio-temporal data analysis.
WU Gui-xing,born in 1972,Ph.D,professor,Ph.D supervisor.His main research interests include computer vision,traffic flow forecast and so on.
Supported by:
National Natural Science Foundation of China(61772171).

摘要/Abstract

摘要： 人类动作识别是一个极具挑战性的研究课题,广泛应用于安全监控、人机交互和自动驾驶等领域。近年来,图卷积网络在建模非欧几里德结构数据上取得了巨大成功,为骨架模态动作识别提供了新思路。由于骨架预定义图包含大量噪声,现有方法多使用高阶空域特征对空间依赖性进行建模。然而,仅关注高阶子集并不能在全局上反映顶点之间的动态相关性。此外,主流方法中模拟时间依赖性使用的卷积神经网络或循环神经网络也无法捕获多范围的时序关系。为了解决这些问题,文中提出了一种基于骨架模态的多级门控图卷积动作识别网络框架。具体地,提出了门控时序卷积模块来提取时域顶点之间的多时期依赖关系;同时,通过多维注意力机制来增强图的全局表征。为了验证所提方法的有效性,在NTU-RGB+D和Kinetics两个大型视频行为识别基准数据集上进行了实验。结果表明,所提方法的性能优于目前最先进的方法。

关键词: 动作识别, 骨架模态, 计算机视觉, 视频分类, 图卷积

Abstract: Skeleton-based human action recognition is attracting more attention in computer vision.Recently,graph convolutional networks(GCNs),which is powerful to model non-Euclidean structure data,have obtained promising performance and enable a new paradigm for action recognition.Existing approaches mostly model the spatial dependency with emphasis mechanism since the huge pre-defined graph contains large quantities of noise.However,simply emphasizing subsets is not optimal for reflecting the dynamic underlying correlations between vertexes in a global manner.Furthermore,these methods are ineffective to capture the temporal dependencies as the CNNs or RNNs are not capable to model the intricate multi-range temporal relations.To address these issues,a multi-scale gated graph convolutional network (MSG-GCN) is proposed for skeleton-based action recognition.Specifically,a gated temporal convolution module (G-TCM) is presented to capture the consecutive short-term and interval long-term dependencies between vertexes in the temporal domain.Besides,a multi-dimensional attention module for spatial,temporal,and channel,which enhances the expressiveness of spatial graph,is integrated into GCNs with negligible overheads.Extensive experiments on two large-scale benchmark datasets,NTU-RGB+D and Kinetics,demonstrate that our approach outperforms the state-of-the-art baselines.

Key words: Action recognition, Computer vision, Graph convolution, Skeleton modality, Video classification

中图分类号:

TP183

干创, 吴桂兴, 詹庆原, 王鹏焜, 彭志磊. 基于骨架模态的多级门控图卷积动作识别网络[J]. 计算机科学, 2022, 49(1): 181-186. https://doi.org/10.11896/jsjkx.201100164

GAN Chuang, WU Gui-xing, ZHAN Qing-yuan, WANG Peng-kun, PENG Zhi-lei. Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition[J]. Computer Science, 2022, 49(1): 181-186. https://doi.org/10.11896/jsjkx.201100164

参考文献

[1]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Advances in Neural Information Processing Systems.2014:568-576.
[2]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497.
[3]WANG L,QIAO Y,TANG X.Action recognition with trajectory-pooled deep-convolutional descriptors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4305-4314.
[4]ZHAO Y,XIONG Y,WANG L,et al.Temporal action detection with structured segment networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2914-2923.
[5]SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb＋d:A large scale dataset for 3d human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019.
[6]SONG S,LAN C,XING J,et al.An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2017:4263-4270.
[7]KIM T S,REITER A.Interpretable 3d human action analysiswith temporal convolutional networks[C]//2017 IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops (CVPRW).IEEE,2017:1623-1631.
[8]LI C,ZHONG Q,XIE D,et al.Skeleton-based action recognition with convolutional neural networks[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).IEEE,2017:597-600.
[9]LIU J,AKHTAR N,MIAN A.Skepxels:Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition[C]//CVPR Workshops.2019.
[10]ESTRACH J B,ZAREMBA W,SZLAM A,et al.Spectral networks and locally connected networks on graphs[C]//International Conference on Learning Representations (ICLR).2014.
[11]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations (ICLR).2017.
[12]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2018:7444-7452.
[13]LI B,LI X,ZHANG Z,et al.Spatio-temporal graph routing for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8561-8568.
[14]LI M,CHEN S,CHEN X,et al.Actional-structural graph con-volutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[15]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12026-12035.
[16]ZHANG X,XU C,TAO D.Context Aware Graph Convolution for Skeleton-Based Action Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14333-14342.
[17]HU J F,ZHENG W S,LAI J,et al.Jointly learning heteroge-neous features for RGB-D activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5344-5352.
[18]YIN W,SCHÜTZE H,XIANG B,et al.Abcnn:Attention-based convolutional neural network for modeling sentence pairs[J].Transactions of the Association for Computational Linguistics,2016,4:259-272.
[19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[20]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[21]KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[22]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[23]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019:8026-8037.

相关文章 15

[1]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[2]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[3]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[4]	李健智, 王红玲, 王中卿. 基于图卷积网络的专利摘要自动生成研究 Automatic Generation of Patent Summarization Based on Graph Convolution Network 计算机科学, 2022, 49(6A): 172-177. https://doi.org/10.11896/jsjkx.210400117
[5]	邵延华, 李文峰, 张晓强, 楚红雨, 饶云波, 陈璐. 基于时空图卷积和注意力模型的航拍暴力行为识别 Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model 计算机科学, 2022, 49(6): 254-261. https://doi.org/10.11896/jsjkx.210400272
[6]	赵小虎, 叶圣, 李晓. 多算法融合的骨骼重建信息动作分类方法 Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction 计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070
[7]	李子仪, 周夏冰, 王中卿, 张民. 基于用户关联的立场检测 Stance Detection Based on User Connection 计算机科学, 2022, 49(5): 221-226. https://doi.org/10.11896/jsjkx.210400135
[8]	高越, 傅湘玲, 欧阳天雄, 陈松龄, 闫晨巍. 基于时空自适应图卷积神经网络的脑电信号情绪识别 EEG Emotion Recognition Based on Spatiotemporal Self-Adaptive Graph ConvolutionalNeural Network 计算机科学, 2022, 49(4): 30-36. https://doi.org/10.11896/jsjkx.210900200
[9]	张继凯, 李琦, 王月明, 吕晓琪. 基于单目RGB图像的三维手势跟踪算法综述 Survey of 3D Gesture Tracking Algorithms Based on Monocular RGB Images 计算机科学, 2022, 49(4): 174-187. https://doi.org/10.11896/jsjkx.210700084
[10]	周海榆, 张道强. 面向多中心数据的超图卷积神经网络及应用 Multi-site Hyper-graph Convolutional Neural Networks and Application 计算机科学, 2022, 49(3): 129-133. https://doi.org/10.11896/jsjkx.201100152
[11]	李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁. 融合双重权重机制和图卷积神经网络的微博细粒度情感分类 Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network 计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073
[12]	潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松. 基于交互注意力图卷积网络的方面情感分类 Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification 计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180
[13]	解宇, 杨瑞玲, 刘公绪, 李德玉, 王文剑. 基于动态拓扑图的人体骨架动作识别算法 Human Skeleton Action Recognition Algorithm Based on Dynamic Topological Graph 计算机科学, 2022, 49(2): 62-68. https://doi.org/10.11896/jsjkx.210900059
[14]	谈馨悦, 何小海, 王正勇, 罗晓东, 卿粼波. 基于Transformer交叉注意力的文本生成图像技术 Text-to-Image Generation Technology Based on Transformer Cross Attention 计算机科学, 2022, 49(2): 107-115. https://doi.org/10.11896/jsjkx.210600085
[15]	苗启广, 辛文天, 刘如意, 谢琨, 王泉, 杨宗凯. 面向智慧教育行为分析的图卷积骨架动作识别方法 Graph Convolutional Skeleton-based Action Recognition Method for Intelligent Behavior Analysis 计算机科学, 2022, 49(2): 156-161. https://doi.org/10.11896/jsjkx.220100061

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于骨架模态的多级门控图卷积动作识别网络

Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0