计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231000073-5.doi: 10.11896/jsjkx.231000073

• 图像处理&多媒体技术 • 上一篇    下一篇

基于骨架特征的瓶颈层多尺度图卷积动作识别方法

黄海新, 王钰瑶, 蔡明启   

  1. 沈阳理工大学自动化与电气工程学院 沈阳 110159
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 黄海新(huanghaixin@sylu.edu.cn)
  • 基金资助:
    国家自然科学基金(61672359)

Bottleneck Multi-scale Graph Convolutional Network for Skeleton-based Action Recognition

HUANG Haixin, WANG Yuyao, CAI Mingqi   

  1. School of Automation and Electrical Engineering,Shenyang Ligong University,Shenyang 110159,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:HUANG Haixin,born in 1973,Ph.D,associate professor.Her main research interests include machine learning,artificial intelligence and intelligent grid.
  • Supported by:
    National Natural Science Foundation of China(61672359).

摘要: 动作识别方法在计算机视觉领域取得了显著的效果,其中图卷积网络是动作识别任务的一种重要手段,在提取图结构数据的特征中表现出了卓越优势。然而,现有的图卷积动作识别网络仍存在一些问题,如过度依赖预定义骨架拓扑图结构、大时间卷积核计算成本高且缺乏灵活性等,这些问题极大限制了模型的表达能力和鲁棒性。文中提出了一种基于骨架数据的自适应瓶颈层多尺度图卷积动作识别方法,自适应空间模块对骨架拓扑图结构和参数进行优化学习,从而增强模型灵活性和适应性;瓶颈层多尺度时序模块提高时间建模能力,通过减少通道宽度来节省计算成本和参数。为验证所提方法的有效性,在大型骨架动作识别数据集NTU-RGB+D和NTU-RGB+D 120上进行实验。结果证明,改进后的算法的准确率得到了一定提升。

关键词: 动作识别, 骨架模态, 图卷积网络, 视频分类, 计算机视觉

Abstract: Action recognition methods have achieved significant success in the field of computer vision.Graph convolutional networks(GCNs) are crucial techniques for action recognition tasks,especially for extracting features from graph-structured data.However,existing GCNs suffer from limitations such as an excessive reliance on predefined skeleton topological graphs and a lack of flexibility in handling large temporal convolution kernels,which significantly constrain their expressive power and robustness.In this paper,we propose an adaptive bottleneck multi-scale graph convolutional action recognition method based on skeleton data.The adaptive spatial module optimizes the skeleton topological graph structure and parameters,enhancing the model's flexibi-lity.The bottleneck layer multi-scale temporal module improves the temporal modeling capabilities while reducing channel width to save computational costs and parameters.Experimental results on large-scale skeleton action recognition datasets,NTU-RGB+Dand NTU-RGB+D 120,show that the accuracy of our model is improved to a certain extent.

Key words: Action recognition, Skeleton modality, Graph convolution network, Video classification, Computer vision

中图分类号: 

  • TP183
[1]SHOTTON J,FITZGIBBON A,COOK M,et al.Real-time hu-man pose recognition in parts from single depth images[C]//CVPR 2011.IEEE,2011:1297-1304.
[2]LIU J,RAHMANI H,AKHTAR N,et al.Learning human pose models from synthesized data for robust RGB-D action recognition[J].International Journal of Computer Vision,2019,127:1545-1564.
[3]SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5693-5703.
[4]GONG J,FAN Z,KE Q,et al.Meta agent teaming active learning for pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11079-11089.
[5]XIN W,LIU R,LIU Y,et al.Transformer for Skeleton-basedaction recognition:A review of recent advances[J].Neurocomputing,2023,537:164-186.
[6]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[7]LI M,CHEN S,CHEN X,et al.Actional-structural graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603.
[8]CHENG K,ZHANG Y,HE X,et al.Skeleton-based action rec-ognition with shift graph convolutional network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:183-192.
[9]HEDEGAARD L,HEIDARI N,IOSIFIDIS A.Online skeleton-based action recognition with continual spatio-temporal graph convolutional networks[J].arXiv:2203.11009,2022.
[10]LEE J,LEE M,LEE D,et al.Hierarchically decomposed graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:10444-10453.
[11]SHAHROUDY A,LIU J,NG T T,et al.Ntu rgb+ d:A large scale dataset for 3d human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019.
[12]LIU J,SHAHROUDY A,PEREZ M,et al.Ntu rgb+ d 120:A large-scale benchmark for 3d human activity understanding[J].IEEE transactions on pattern analysis and machine intelligence,2019,42(10):2684-2701.
[13]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[14]HU J F,ZHENG W S,LAI J,et al.Jointly learning heterogene-ous features for RGB-D activity recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:5344-5352.
[15]SOO KIM T,REITER A.Interpretable 3d human action analysis with temporal convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017:20-28.
[16]CAETANO C,SENA J,BRÉMONDF,et al.Skelemotion:Anew representation of skeleton joint sequences based on motion information for 3d action recognition[C]//2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance(AVSS).IEEE,2019:1-8.
[17]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12026-12035.
[18]ZHANG X,XU C,TAO D.Context aware graph convolution for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:14333-14342.
[19]LI L,WANG M,NI B,et al.3d human action representationlearning via cross-view consistency pursuit[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4741-4750.
[20]ZHANG J,YE G,TU Z,et al.A spatial attentive and temporal dilated(SATD) GCN for skeleton-based action recognition[J].CAAI Transactions on Intelligence Technology,2022,7(1):46-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!