Computer Science ›› 2022, Vol. 49 ›› Issue (1): 181-186.doi: 10.11896/jsjkx.201100164

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition

GAN Chuang1, WU Gui-xing1,2, ZHAN Qing-yuan1, WANG Peng-kun1, PENG Zhi-lei1   

  1. 1 School of Software Engineering,University of Science and Technology of China,Suzhou,Jiangsu 215000,China
    2 Suzhou Research Institute,University of Science and Technology of China,Suzhou,Jiangsu 215000,China
  • Received:2020-11-24 Revised:2021-04-06 Online:2022-01-15 Published:2022-01-18
  • About author:GAN Chuang,born in 1996,postgra-duate.His main research interests include action recognition and spatio-temporal data analysis.
    WU Gui-xing,born in 1972,Ph.D,professor,Ph.D supervisor.His main research interests include computer vision,traffic flow forecast and so on.
  • Supported by:
    National Natural Science Foundation of China(61772171).

Abstract: Skeleton-based human action recognition is attracting more attention in computer vision.Recently,graph convolutional networks(GCNs),which is powerful to model non-Euclidean structure data,have obtained promising performance and enable a new paradigm for action recognition.Existing approaches mostly model the spatial dependency with emphasis mechanism since the huge pre-defined graph contains large quantities of noise.However,simply emphasizing subsets is not optimal for reflecting the dynamic underlying correlations between vertexes in a global manner.Furthermore,these methods are ineffective to capture the temporal dependencies as the CNNs or RNNs are not capable to model the intricate multi-range temporal relations.To address these issues,a multi-scale gated graph convolutional network (MSG-GCN) is proposed for skeleton-based action recognition.Specifically,a gated temporal convolution module (G-TCM) is presented to capture the consecutive short-term and interval long-term dependencies between vertexes in the temporal domain.Besides,a multi-dimensional attention module for spatial,temporal,and channel,which enhances the expressiveness of spatial graph,is integrated into GCNs with negligible overheads.Extensive experiments on two large-scale benchmark datasets,NTU-RGB+D and Kinetics,demonstrate that our approach outperforms the state-of-the-art baselines.

Key words: Action recognition, Computer vision, Graph convolution, Skeleton modality, Video classification

CLC Number: 

  • TP183
[1]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Advances in Neural Information Processing Systems.2014:568-576.
[2]TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:4489-4497.
[3]WANG L,QIAO Y,TANG X.Action recognition with trajectory-pooled deep-convolutional descriptors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4305-4314.
[4]ZHAO Y,XIONG Y,WANG L,et al.Temporal action detection with structured segment networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2914-2923.
[5]SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+d:A large scale dataset for 3d human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019.
[6]SONG S,LAN C,XING J,et al.An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2017:4263-4270.
[7]KIM T S,REITER A.Interpretable 3d human action analysiswith temporal convolutional networks[C]//2017 IEEE Confe-rence on Computer Vision and Pattern Recognition Workshops (CVPRW).IEEE,2017:1623-1631.
[8]LI C,ZHONG Q,XIE D,et al.Skeleton-based action recognition with convolutional neural networks[C]//2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).IEEE,2017:597-600.
[9]LIU J,AKHTAR N,MIAN A.Skepxels:Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition[C]//CVPR Workshops.2019.
[10]ESTRACH J B,ZAREMBA W,SZLAM A,et al.Spectral networks and locally connected networks on graphs[C]//International Conference on Learning Representations (ICLR).2014.
[11]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//International Conference on Learning Representations (ICLR).2017.
[12]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2018:7444-7452.
[13]LI B,LI X,ZHANG Z,et al.Spatio-temporal graph routing for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8561-8568.
[14]LI M,CHEN S,CHEN X,et al.Actional-structural graph con-volutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[15]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:12026-12035.
[16]ZHANG X,XU C,TAO D.Context Aware Graph Convolution for Skeleton-Based Action Recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14333-14342.
[17]HU J F,ZHENG W S,LAI J,et al.Jointly learning heteroge-neous features for RGB-D activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5344-5352.
[18]YIN W,SCHÜTZE H,XIANG B,et al.Abcnn:Attention-based convolutional neural network for modeling sentence pairs[J].Transactions of the Association for Computational Linguistics,2016,4:259-272.
[19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[20]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:3-19.
[21]KAY W,CARREIRA J,SIMONYAN K,et al.The kinetics human action video dataset[J].arXiv:1705.06950,2017.
[22]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299.
[23]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019:8026-8037.
[1] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[2] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[3] LI Jian-zhi, WANG Hong-ling, WANG Zhong-qing. Automatic Generation of Patent Summarization Based on Graph Convolution Network [J]. Computer Science, 2022, 49(6A): 172-177.
[4] SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu. Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model [J]. Computer Science, 2022, 49(6): 254-261.
[5] ZHAO Xiao-hu, YE Sheng, LI Xiao. Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction [J]. Computer Science, 2022, 49(6): 269-275.
[6] LI Zi-yi, ZHOU Xia-bing, WANG Zhong-qing, ZHANG Min. Stance Detection Based on User Connection [J]. Computer Science, 2022, 49(5): 221-226.
[7] GAO Yue, FU Xiang-ling, OUYANG Tian-xiong, CHEN Song-ling, YAN Chen-wei. EEG Emotion Recognition Based on Spatiotemporal Self-Adaptive Graph ConvolutionalNeural Network [J]. Computer Science, 2022, 49(4): 30-36.
[8] ZHANG Ji-kai, LI Qi, WANG Yue-ming, LYU Xiao-qi. Survey of 3D Gesture Tracking Algorithms Based on Monocular RGB Images [J]. Computer Science, 2022, 49(4): 174-187.
[9] ZHOU Hai-yu, ZHANG Dao-qiang. Multi-site Hyper-graph Convolutional Neural Networks and Application [J]. Computer Science, 2022, 49(3): 129-133.
[10] LI Hao, ZHANG Lan, YANG Bing, YANG Hai-xiao, KOU Yong-qi, WANG Fei, KANG Yan. Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network [J]. Computer Science, 2022, 49(3): 246-254.
[11] PAN Zhi-hao, ZENG Bi, LIAO Wen-xiong, WEI Peng-fei, WEN Song. Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification [J]. Computer Science, 2022, 49(3): 294-300.
[12] XIE Yu, YANG Rui-ling, LIU Gong-xu, LI De-yu, WANG Wen-jian. Human Skeleton Action Recognition Algorithm Based on Dynamic Topological Graph [J]. Computer Science, 2022, 49(2): 62-68.
[13] TAN Xin-yue, HE Xiao-hai, WANG Zheng-yong, LUO Xiao-dong, QING Lin-bo. Text-to-Image Generation Technology Based on Transformer Cross Attention [J]. Computer Science, 2022, 49(2): 107-115.
[14] MIAO Qi-guang, XIN Wen-tian, LIU Ru-yi, XIE Kun, WANG Quan, YANG Zong-kai. Graph Convolutional Skeleton-based Action Recognition Method for Intelligent Behavior Analysis [J]. Computer Science, 2022, 49(2): 156-161.
[15] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!