计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 106-114.doi: 10.11896/jsjkx.240800108
赵晨, 彭舰, 黄军豪
ZHAO Chen, PENG Jian, HUANG Junhao
摘要: 近年来,基于骨架的动作识别任务受到了研究人员的广泛关注,并取得了长足的研究进展。图卷积网络和卷积神经网络作为强大且有效的模型范式,在骨架动作识别领域同样受到了研究人员的青睐。1)大多数基于GCN(Graph Convolutional Network)的方法使用的是时间、空间分别建模的方式,这阻碍了时空信息的直接交互;2)基于CNN(Convolutional Neural Network)的方法有效地建模了时空信息,但相比于基于GCN的方法,它并没有很好地利用空间信息。针对上述问题,提出了一个新颖的时空信息聚合操作,称作时空节点映射(Spatial-Temporal Joint Mapping,STJM)。该方法既结合了基于GCN的方法中图的拓扑信息,又采用了基于CNN的方法来同时聚合时空信息。相较于传统的GCN方法,该方法将节点进行了高维映射,拥有更强的表意能力。在进行节点高维映射后,只需要一个简单的τ×K的卷积核即可同时聚合时间与空间特征。作为一个新颖的时空信息聚合模块,许多基于GCN的拓扑增强策略都可以应用在STJM block上。实验表明,将STJM作为一个即插即用的模块与现有模型进行结合,在NTU RGB+D 60和NTU RGB+D 120两个大规模骨架数据集上,其性能获得了显著提升。
中图分类号:
[1]REN B,LIU M,DING R,et al.A survey on 3d skeleton-based action recognition using learning method[J].arXiv:2002.05907,2020. [2]ZHANG Z.Microsoft kinect sensor and its effect[J].IEEE Mul-timedia,2012,19(2):4-10. [3]CHU X,YANG W,OUYANG W,et al.Multi-context attention for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1831-1840. [4]YANG W,OUYANG W,LI H,et al.End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3073-3082. [5]CAO Z,SIMON T,WEI S E,et al.Realtime multi-person 2dpose estimation using part affinity fields[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7291-7299. [6]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2018. [7]SHI L,ZHANG Y,CHENG J,et al.Two-stream adaptive graphconvolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12026-12035. [8]LIU Z,ZHANG H,CHEN Z,et al.Disentangling and unifying graph convolutions for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:143-152. [9]CHEN Y,ZHANG Z,YUAN C,et al.Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2021:13359-13368. [10]GEDAMU K,JI Y,GAO L L,et al.Relation-mining self-attention network for skeleton-basedhuman action recognition[J].Pattern Recognition,2023,139:109455. [11]LI C,ZHONG Q,XIE D,et al.Co-occurrence feature learningfrom skeleton data for action recognition and detection with hierarchical aggregation[J].arXiv:1804.06055,2018. [12]XU K,YE F,ZHONG Q,et al.Topology-aware convolutionalneural network for efficient skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:2866-2874. [13]LI C,XIE C,ZHANG B,et al.Memory attention networks for skeleton-based action recognition[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(9):4800-4814. [14]THAKKAR K,NARAYANAN P J.Part-based graph convolutional network for action recognition[J].arXiv:1809.04983,2018. [15]PENG W,HONG X,CHEN H,et al.Learning graph convolutional network for skeleton-based human action recognition by neural searching[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:2669-2676. [16]SONG Y F,ZHANG Z,SHAN C,et al.Stronger,faster andmore explainable:A graph convolutional baseline for skeleton-based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1625-1633. [17]SHAHROUDY A,LIU J,NG T T,et al.NTU RGB+D:A large scale dataset for 3d human activity analysis[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1010-1019. [18]LIU J,SHAHROUDY A,PEREZ M,et al.NTU RGB+D 120:A large-scale benchmark for 3d human activity understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10):2684-2701. [19]ZHANG P,XUE J,LAN C,et al.Adding attentiveness to the neurons in recurrent neural networks[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:135-151. [20]WANG H,WANG L.Modeling temporal dynamics and spatialconfigurations of actions using two-stream recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:499-508. [21]SI C,CHEN W,WANG W,et al.An attention enhanced graph convolutional lstm network for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1227-1236. [22]ZHAO R,ALI H,VAN DER SMAGT P.Two-stream RNN/CNN for action recognition in 3D videos[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2017:4260-4267. [23]LI W,WEN L,CHANG M C,et al.Adaptive RNN tree for large-scale human action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1444-1452. [24]YE F,PU S,ZHONG Q,et al.Dynamic gcn:Context-enriched topology learning for skeleton-based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:55-63. [25]LI M,CHEN S,CHEN X,et al.Actional-structural graph con-volutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3595-3603. [26]PAN L,LU J,TANG X.Spatial-temporal graph neural ODEnetworks for skeleton-based action recognition[J].Scientific Reports,2024,14(1):7629. [27]SALVADOR S,CHAN P.Toward accurate dynamic time warping in linear time and space[J].Intelligent Data Analysis,2007,11(5):561-580. [28]CHEN Z,LI S,YANG B,et al.Multi-scale spatial temporalgraph convolutional network for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:1113-1122. [29]LU J,HUANG T T,ZHAO B,et al.Dual Excitation Spatial-temporal Graph Convolution Network for Skeleton-Based Action Recognition[J].IEEE Sensors Journal,2024,24(6):8184-8196. [30]CAO Y,XIA Y,GAO Q Y,et al.Skeleton-based action recognition based on hyper-connected graph convolutional network[J].Journal of Jilin University(Engineering and Technology Edition),2025,55(2):731-740. [31]DU Y,FU Y,WANG L.Skeleton based action recognition with convolutional neural network[C]//2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).IEEE,2015:579-583. [32]KIM T S,REITER A.Interpretable 3d human action analysis with temporal convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017:20-28. [33]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141. [34]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016. [35]LECUN Y,BOSER B,DENKER J,et al.Handwritten digit re-cognition with a back-propagation network[C]//Proceedings of the 3rd International Conference on Neural Information Proces-sing Systems.1989:396-404. [36]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [37]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826. [38]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [39]ZHANG P,LAN C,ZENG W,et al.Semantics-guided neuralnetworks for efficient skeleton-based human action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1112-1121. [40]VERMA V,LAMB A,BECKHAM C,et al.Manifold mixup:Better representations by interpolating hidden states[C]//International Conference on Machine Learning.PMLR,2019:6438-6447. [41]SHI L,ZHANG Y,CHENG J,et al.Skeleton-based action recognition with multi-stream adaptive graph convolutional networks[J].IEEE Transactions on Image Processing,2020,29:9532-9545. [42]WU L,ZHANG C,ZOU Y.SpatioTemporal focus for skeleton-based action recognition[J].Pattern Recognition,2023,136:109231. [43]SHI L,ZHANG Y,CHENG J,et al.Adasgn:Adapting jointnumber and model size for efficient skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2021:13413-13422. [44]CHENG K,ZHANG Y,HE X,et al.Skeleton-based action recognition with shift graph convolutional network[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:183-192. [45]YANG D,WANG Y,DANTCHEVA A,et al.Unik:A unified framework for real-world skeleton-based action recognition[J].arXiv:2107.08580,2021. [46]KANG M S,KANG D,KIM H S.Efficient skeleton-based action recognition via joint-mapping strategies[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2023:3403-3412. [47]GEDAMU K,JI Y,GAO L L,et al.Relation-mining self-attention network for skeleton-based human action recognition[J].Pattern Recognition,2023,139:109455. [48]YANG W,ZHANG J,CAI J,et al.HybridNet:Integrating GCNand CNN for skeleton-based action recognition[J].Applied Intelligence,2023,53(1):574-585. [49]BAVIL A F,DAMIRCHI H,TAGHIRAD H D.Action Capsules:Human skeleton action recognition[J].Computer Vision and Image Understanding,2023,233:103722. |
|