计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 181-186.doi: 10.11896/jsjkx.201100164
干创1, 吴桂兴1,2, 詹庆原1, 王鹏焜1, 彭志磊1
GAN Chuang1, WU Gui-xing1,2, ZHAN Qing-yuan1, WANG Peng-kun1, PENG Zhi-lei1
摘要: 人类动作识别是一个极具挑战性的研究课题,广泛应用于安全监控、人机交互和自动驾驶等领域。近年来,图卷积网络在建模非欧几里德结构数据上取得了巨大成功,为骨架模态动作识别提供了新思路。由于骨架预定义图包含大量噪声,现有方法多使用高阶空域特征对空间依赖性进行建模。然而,仅关注高阶子集并不能在全局上反映顶点之间的动态相关性。此外,主流方法中模拟时间依赖性使用的卷积神经网络或循环神经网络也无法捕获多范围的时序关系。为了解决这些问题,文中提出了一种基于骨架模态的多级门控图卷积动作识别网络框架。具体地,提出了门控时序卷积模块来提取时域顶点之间的多时期依赖关系;同时,通过多维注意力机制来增强图的全局表征。为了验证所提方法的有效性,在NTU-RGB+D和Kinetics两个大型视频行为识别基准数据集上进行了实验。结果表明,所提方法的性能优于目前最先进的方法。
中图分类号:
[1]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Advances in Neural Information Processing Systems.2014:568-576. [2]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497. [3]WANG L,QIAO Y,TANG X.Action recognition with trajectory-pooled deep-convolutional descriptors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4305-4314. [4]ZHAO Y,XIONG Y,WANG L,et al.Temporal action detection with structured segment networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2914-2923. [5]SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+d:A large scale dataset for 3d human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019. [6]SONG S,LAN C,XING J,et al.An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2017:4263-4270. [7]KIM T S,REITER A.Interpretable 3d human action analysiswith temporal convolutional networks[C]//2017 IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops (CVPRW).IEEE,2017:1623-1631. [8]LI C,ZHONG Q,XIE D,et al.Skeleton-based action recognition with convolutional neural networks[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).IEEE,2017:597-600. [9]LIU J,AKHTAR N,MIAN A.Skepxels:Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition[C]//CVPR Workshops.2019. [10]ESTRACH J B,ZAREMBA W,SZLAM A,et al.Spectral networks and locally connected networks on graphs[C]//International Conference on Learning Representations (ICLR).2014. [11]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations (ICLR).2017. [12]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2018:7444-7452. [13]LI B,LI X,ZHANG Z,et al.Spatio-temporal graph routing for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8561-8568. [14]LI M,CHEN S,CHEN X,et al.Actional-structural graph con-volutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:3595-3603. [15]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12026-12035. [16]ZHANG X,XU C,TAO D.Context Aware Graph Convolution for Skeleton-Based Action Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14333-14342. [17]HU J F,ZHENG W S,LAI J,et al.Jointly learning heteroge-neous features for RGB-D activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5344-5352. [18]YIN W,SCHÜTZE H,XIANG B,et al.Abcnn:Attention-based convolutional neural network for modeling sentence pairs[J].Transactions of the Association for Computational Linguistics,2016,4:259-272. [19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [20]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19. [21]KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017. [22]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299. [23]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019:8026-8037. |
[1] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[2] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[3] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[4] | 李健智, 王红玲, 王中卿. 基于图卷积网络的专利摘要自动生成研究 Automatic Generation of Patent Summarization Based on Graph Convolution Network 计算机科学, 2022, 49(6A): 172-177. https://doi.org/10.11896/jsjkx.210400117 |
[5] | 邵延华, 李文峰, 张晓强, 楚红雨, 饶云波, 陈璐. 基于时空图卷积和注意力模型的航拍暴力行为识别 Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model 计算机科学, 2022, 49(6): 254-261. https://doi.org/10.11896/jsjkx.210400272 |
[6] | 赵小虎, 叶圣, 李晓. 多算法融合的骨骼重建信息动作分类方法 Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction 计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070 |
[7] | 李子仪, 周夏冰, 王中卿, 张民. 基于用户关联的立场检测 Stance Detection Based on User Connection 计算机科学, 2022, 49(5): 221-226. https://doi.org/10.11896/jsjkx.210400135 |
[8] | 高越, 傅湘玲, 欧阳天雄, 陈松龄, 闫晨巍. 基于时空自适应图卷积神经网络的脑电信号情绪识别 EEG Emotion Recognition Based on Spatiotemporal Self-Adaptive Graph ConvolutionalNeural Network 计算机科学, 2022, 49(4): 30-36. https://doi.org/10.11896/jsjkx.210900200 |
[9] | 张继凯, 李琦, 王月明, 吕晓琪. 基于单目RGB图像的三维手势跟踪算法综述 Survey of 3D Gesture Tracking Algorithms Based on Monocular RGB Images 计算机科学, 2022, 49(4): 174-187. https://doi.org/10.11896/jsjkx.210700084 |
[10] | 周海榆, 张道强. 面向多中心数据的超图卷积神经网络及应用 Multi-site Hyper-graph Convolutional Neural Networks and Application 计算机科学, 2022, 49(3): 129-133. https://doi.org/10.11896/jsjkx.201100152 |
[11] | 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁. 融合双重权重机制和图卷积神经网络的微博细粒度情感分类 Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network 计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073 |
[12] | 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松. 基于交互注意力图卷积网络的方面情感分类 Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification 计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180 |
[13] | 解宇, 杨瑞玲, 刘公绪, 李德玉, 王文剑. 基于动态拓扑图的人体骨架动作识别算法 Human Skeleton Action Recognition Algorithm Based on Dynamic Topological Graph 计算机科学, 2022, 49(2): 62-68. https://doi.org/10.11896/jsjkx.210900059 |
[14] | 谈馨悦, 何小海, 王正勇, 罗晓东, 卿粼波. 基于Transformer交叉注意力的文本生成图像技术 Text-to-Image Generation Technology Based on Transformer Cross Attention 计算机科学, 2022, 49(2): 107-115. https://doi.org/10.11896/jsjkx.210600085 |
[15] | 苗启广, 辛文天, 刘如意, 谢琨, 王泉, 杨宗凯. 面向智慧教育行为分析的图卷积骨架动作识别方法 Graph Convolutional Skeleton-based Action Recognition Method for Intelligent Behavior Analysis 计算机科学, 2022, 49(2): 156-161. https://doi.org/10.11896/jsjkx.220100061 |
|