计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220800147-6.doi: 10.11896/jsjkx.220800147

• 图像处理&多媒体技术 • 上一篇    下一篇

多流融合的轻量级图卷积行为识别算法

李华, 赵领娣, 陈雨杰, 杨杨, 杜新兆   

  1. 长春理工大学计算机科学技术学院 长春 130022
  • 发布日期:2023-11-09
  • 通讯作者: 赵领娣(1025205283@qq.com)
  • 作者简介:(lihua@cust.edu.cn)
  • 基金资助:
    国家自然科学基金(U19A2063);吉林省科技厅自然科学基金项目(20210101412JC)

Lightweight Graph Convolution Action Recognition Algorithm Based on Multi-streamFusion

LI Hua, ZHAO Lingdi, CHEN Yujie, YANG Yang, DU Xinzhao   

  1. College of Computer Science and Technology,Changchun University of Science and Technology,Changchun Jilin 130022,China
  • Published:2023-11-09
  • About author:LI Hua,born in 1977,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include computer vision and virtual reality technology.
    ZHAO Lingdi,born in 1997,master.Her main research interest is virtual reality.
  • Supported by:
    National Natural Science Foundation of China(U19A2063) and Natural Science Fund Project of Science and Technology Department of Jilin Province(20210101412JC)

摘要: 传统的基于RGB视频的行为识别容易受到光线强度、观察视角等问题的影响。基于骨骼的行为识别受这些问题的影响较小,成为现在的主流方法之一。但目前基于骨骼信息的行为识别方法参数量较大,运算速度较慢。为了解决这些问题,提出一种多流融合的轻量级图卷积行为识别框架。首先,将融合人体关节、骨骼边、关节速度和骨骼速度的多种信息的数据输入到空间图卷积模块中;其次,在空间图卷积模块中加入了空间注意力机制来更好地提取各个关节之间的关系;最后,在时间卷积模块中使用了深度卷积和逐点卷积减少参数量。提出的网络与基线网络SGN相比,在NTU-RGB+D120数据集中,交叉视角评估下提高了2.3%,交叉设置评估下提高了1.9%,参数量减少了0.12×106个,从而验证了提出网络的有效性。

关键词: 人体骨骼, 行为识别, 轻量级, 注意力机制, 图卷积

Abstract: Traditional action recognition based on RGB-based methods is easy to be affected by problems such as light intensity and viewing angle.Skeleton-based action recognition is less affected by these problems and has become one of the mainstream methods.However,the current skeleton-based action recognition methods have a large number of parameters and slow operation speed.In order to solve these problems,a multi-stream fusion lightweight graph convolution action recognition framework is proposed.Firstly,the data fused with various information of joint,bone,joint motion and bone motion are input into the spatial map convolution module.Secondly,the spatial attention mechanism is added to the spatial graph convolution module to better extract the relationship between the joints.Finally,in the time convolution module,depthwise convolution and pointwise convolution are used to reduce the amount of parameters.Compared with the baseline network SGN,in NTU-RGB+D120 dataset,the proposed network increases by 2.3% under cross-subject evaluation,increases by 1.9% under cross-setup evaluation,and the number of parameters reduces by 0.12×106.The validity of the proposed network is verified.

Key words: Human skeleton, Action recognition, Lightweight, Attention mechanism, Graph convolution

中图分类号: 

  • TP391
[1]DENG M L,GAO Z D,LI L,et al.Overview of Human Behavior Recognition Based on Deep Learning[J].Computer Engineering and Applications,2022,58(13):14-26.
[2]CAI Q,DENG Y B,LI H S,et al.Survey on Human Action Re-cognition Based on Deep Learning[J].Computer Science,2020,47(4):85-93.
[3]SU B Y,WU H,SHENG M,et al.Accurate Hierarchical Hu-man Actions Recognition From Kinect Skeleton Data[J].IEEE Access,2019,7.
[4]LI M H,XU H J,SHI L X,et al.Multi-person Activity Recognition Based on Bone Keypoints Detection[J].Computer Science,2021,48(4):138-143.
[5]JIANG Q Y,WU X J,XU T Y.M2FA:multi-dimensional feature fusion attention mechanism for skeleton-based action recognition[J].Journal of Image and Graphics,2022,27(8):2391-2403.
[6]LEE J,LEE M,LEE D,et al.Hierarchically Decomposed GraphConvolutional Networks for Skeleton-Based Action Recognition[J].arXiv:2208.10741,2022.
[7]DUAN H,ZHAO Y,XIONG Y,et al.Omni-sourced webly-supervised learning for video recognition[C]//European Confe-rence on Computer Vision.Cham:Springer,2020:670-688.
[8]ATEFE A,ALI N,EBRAHIMI M M.Sparse Deep LSTMs with Convolutional Attention for Human Action Recognition[J].SN Computer Science,2021,2(3).
[9]CHEN Y,ZHANG Z,YUAN C,et al.Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2021:13359-13368.
[10]LI C,ZHONG Q,XIE D,et al.Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[J].arXiv:1804.06055,2018.
[11]DU Y,WANG W,WANG L.Hierarchical recurrent neural network for skeleton based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1110-1118.
[12]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-se-cond AAAI Conference on Artificial Intelligence.2018.
[13]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12026-12035.
[14]QIN Z Y,LIU Y,JI P,et al.Fusing Higher-Order Features in Graph Neural Networks for Skeleton-Based Action Recognition[J].arXiv:2015.01563,2022.
[15]CHENG K,ZHANG Y,HE X,et al.Skeleton-based action re-cognition with shift graph convolutional network[C]//Procee-dings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2020.
[16]LIU Z,ZHANG H,CHEN Z,et al.Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:143-152.
[17]DUAN H,ZHAO Y,CHEN K,et al.Revisiting skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:2969-2978.
[18]ZHANG P,LAN C,ZENG W,et al.Semantics-guided neuralnetworks for efficient skeleton-based human action recognition[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2020.
[19]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[20]SANDLER M,HOWARD A,ZHU M,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4510-4520.
[21]HOWARD A,SANDLER M,CHU G,et al.Searching for MobileNetV3[J].arXiv:1905.02244,2019.
[22]WANG Q,WU B,ZHU P,et al.ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//Procee-dings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2020.
[23]SHAHROUDY A,LIU J,NG T T,et al.NTU RGB+D:A Large Scale Dataset for 3D Human Activity Analysis[J].arXiv:1604.02808,2016.
[24]LIU J,AMIR A,LISBOA P M,et al.NTU RGB+D 120:ALarge-Scale Benchmark for 3D Human Activity Understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10).
[25]CHEN Y S,YA J,WEI W,et al.Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack lear-ning network[J].Pattern Recognition,2020,107.
[26]SONG Y F,ZHANG Z,WANG L.Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons[J].arXiv:1905.06774,2019.
[27]LI M S,CHEN S H,CHEN X,et al.Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition[J].arXiv:1904.12659,2019.
[28]SONG Y F,ZHANG Z,SHAN C,et al.Richly Activated Graph Convolutional Network for Robust Skeleton-Based Action Recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(5).
[29]PENG W,SHI J,ZHAO G.Spatial temporal graph deconvolutional network for skeleton-based human action recognition[J].IEEE Signal Processing Letters,2021,28:244-248.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!