计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 97-106.doi: 10.11896/jsjkx.230500158
吴雨珊1,2, 徐增敏1,2, 张雪莲1,2, 王涛3
WU Yushan1,2, XU Zengmin1,2, ZHANG Xuelian1,2, WANG Tao3
摘要: 传统基于骨架数据的自监督方法常将某一样本的不同增强作为正例,将其余样本均视为负例,这使得正负样本的比例严重失衡,限制了相同语义信息的样本发挥作用。针对上述问题,提出了一种正样本不受数据增强限制的双重最近邻检索动作识别算法DNNCLR。首先,基于人体关节的物理连接设计了一个新的关节级空间数据增强,即Bodypart增强,对输入的骨架序列用正态分布数组随机替换,以获得高级语义嵌入;其次,为避免正样本受数据增强的限制,提出了一种更合理的双重最近邻检索(DNN)正样本扩充策略,进一步提出了双重最近邻检索对比损失DNN Loss。具体为利用支撑集进行全局检索,将正样本集的寻找范围扩展到普通数据增强无法覆盖的新数据点;而负样本集中存在被误判的正样本,其是来自不同视频但语义信息相同的骨架样本。为此,再一次利用最近邻检索,从负样本集中寻找这种潜在的正例,二次扩展正样本集,并进一步提出双重最近邻检索对比损失,迫使模型学习更多的一般特征表示,使得模型优化更加合理。最后,将DNNCLR算法应用在AimCLR模型上,得到AimDNNCLR模型,并在NTU-RGB+D数据集上对该模型进行了线性评估,与前沿模型相比,所提方法在精度上平均提升了3.6%。
中图分类号:
| [1]SHOITAN R,MOUSSA M M,EL NEMR H A.Attribute based spatio-temporal person retrieval in video surveillance[J].Ale-xandria Engineering Journal,2023,63:441-454. [2]TRAN M T,HOANG-XUAN N,TRANG-TRUNG H P,et al.V-FIRST:A Flexible Interactive Retrieval System for Video at VBS 2022[C]//MultiMedia Modeling:28th International Conference.Cham:Springer International Publishing,2022:562-568. [3]LIU W,BAO Q,SUN Y,et al.Recent advances of monocular 2d and 3d human pose estimation:a deep learning perspective[J].ACM Computing Surveys,2022,55(4):1-41. [4]RAUTER M,ABSEHER C,SAFAR M.Augmenting virtualreality with near real world objects[C]//2019 IEEE Conference on Virtual Reality and 3D User Interfaces(VR).USA:IEEE,2019:1134-1135. [5]CAO Z,HIDALGO G,SIMON T,et al.Openpose:Realtimemulti-person 2d pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(1):172-186. [6]FANG H S,XIE S Q,TAI Y W,et al.Rmpe:Regional multi-person pose estimation[C]// Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:2334-2343. [7]XU J W,YU Z B,NI B B,et al.Deep kinematics analysis for monocular 3d human pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,USA:IEEE,2020:899-908. [8]SHAHROUDY A,LIU J,NG T T,et al.NTU RGB+D:A large scale dataset for 3d human activity analysis[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,USA:IEEE,2016:1010-1019. [9]KE Q H,BENNAMOUN M,AN S J,et al.A new representation of skeleton sequences for 3d action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,USA:IEEE,2017:3288-3297. [10]LIU M Y,LIU H,CHEN C.Enhanced skeleton visualization for view invariant human action recognition[J].Pattern Recognition,2017,68:346-362. [11]SONG S J,LAN C L,XING J L,et al.Spatio-temporal attention-based LSTM networks for 3D action recognition and detection[J].IEEE Transactions on Image Processing,2018,27(7):3459-3471. [12]ZHANG P F,LAN C L,XING J L,et al.View adaptive neural networks for high performance skeleton-based human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(8):1963-1978. [13]YAN S J,XIONG Y J,LIN D H.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-second AAAI Conference on Artificial Intelligence.New Orleans,USA:AAAI Press,2018:7444-7452. [14]SHI L,ZHANG Y F,CHENG J,et al.Two-stream adaptivegraph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,USA:IEEE,2019:12026-12035. [15]SI C Y,CHEN W T,WANG W,et al.An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,USA:IEEE,2019:1227-1236. [16]CHEN Z,LI S C,YANG B,et al.Multi-scale spatial temporalgraph convolutional network for skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver,Canada:AAAI Press,2021,35(2):1113-1122. [17]ISLAM A,LUNDELL B,SAWHNEY H,et al.Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.Waikoloa,HI,USA:IEEE,2023:5624-5633. [18]JIAO Y,YANG K,SONG D J,et al.Timeautoad:Autonomous anomaly detection with self-supervised contrastive loss for multi-variate time series[J].IEEE Transactions on Network Science and Engineering,2022,9(3):1604-1619. [19]WICKSTRØM K,KAMPFFMEYER M,MIKALSEN K Ø,et al.Mixing up contrastive learning:Self-supervised representation learning for time series[J].Pattern Recognition Letters,2022,155:54-61. [20]ALBELWI S.Survey on self-supervised learning:auxiliary pretext tasks and contrastive learning methods in imaging[J].Entropy,2022,24(4):551. [21]KOMODAKIS N,GIDARIS S.Unsupervised representationlearning by predicting image rotations[C]//International Conference on Learning Representations.Canada:ICLR,2018. [22]HE K M,FAN H Q,WU Y X,et al.Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,USA:IEEE,2020:9729-9738. [23]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.Virtual Event:PMLR,2020:1597-1607. [24]LI L G,WANG M S,NI B B,et al.3d human action representation learning via cross-view consistency pursuit[C]//Procee-dings of the IEEE FConference on Computer Vision and Pattern Recognition.Nashville,USA:IEEE,2021:4741-4750. [25]GUO T Y,LIU H,CHEN Z,et al.Contrastive learning from extremely augmented skeleton sequences for self-supervised action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Virtual Event:AAAI Press,2022,36(1):762-770. [26]DWIBEDI D,AYTAR Y,TOMPSON J,et al.With a little help from my friends:Nearest-neighbor contrastive learning of visual representations[C]// Proceedings of the IEEE International Conference on Computer Vision.Montreal,Canada:IEEE,2021:9588-9597. [27]ZHENG N G,WEN J,LIU R S,et al.Unsupervised representation learning with long-term dynamics for skeleton based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.New Orleans,USA:AAAI Press,2018,32(1):2644-2651. [28]LIN L L,SONG S J,YANG W H,et al.Ms2l:Multi-task self-supervised learning for skeleton based action recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle,USA:ACM,2020:2490-2498. [29]SU K,LIU X L,SHLIZERMAN E.Predict & cluster:Unsupervised skeleton based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,USA:IEEE,2020:9631-9640. [30]RAO H C,XU S H,HU X P,et al.Augmented skeleton based contrastive action learning with momentum LSTM for unsupervised action recognition[J].Information Sciences,2021,569:90-109. [31]LIU X,ZHANG F J,HOU Z Y,et al.Self-supervised learning:Generative or contrastive[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(1):857-876. [32]MISRA I,ZITNICK C L,HEBERT M.Shuffle and learn:unsupervised learning using temporal order verification[C]//Euro-pean Conference on Computer Vision.Amsterdam,Netherlands:Springer,Cham,2016:527-544. [33]NIE Q,LIU Z W,LIU Y H.Unsupervised 3d human pose representation with viewpoint and pose disentanglement[C]//European Conference on Computer Vision.Glasgow,UK:Springer,Cham,2020:102-118. [34]NOROOZI M,FAVARO P.Unsupervised learning of visual representations by solving jigsaw puzzles[C]//European Confe-rence on Computer Vision.Amsterdam,Netherlands:Springer,Cham,2016:69-84. [35]CHEN X L,FAN H Q,GIRSHICK R,et al.Improved baselines with momentum contrastive learning[J].arXiv:2003.04297,2020. [36]OORD A,LI Y Z,VINYALS O.Representation learning with contrastive predictive coding[J].arXiv:1807.03748,2018. [37]SHORTEN C,KHOSHGOFTAAR T M.A survey on imagedata augmentation for deep learning[J].Journal of Big Data,2019,6(1):1-48. [38]MEMMESHEIMER R,HÄRING S,THEISEN N,et al.Skeleton-DML:deep metric learning for skeleton-based one-shot action recognition[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.USA:IEEE,2022:3702-3710. [39]LIN C C,LIN K,WANG L J,et al.Cross-modal representation learning for zero-shot action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.USA:IEEE,2022:19978-19988. [40]WU C R,PENG Q L,LEE J,et al.Effective hierarchical clustering based on structural similarities in nearest neighbor graphs[J].Knowledge-Based Systems,2021,228:107295. [41]DANG Z Y,DENG C,YANG X,et al.Nearest neighbor ma-tching for deep clustering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.USA:IEEE,2021:13693-13702. [42]CARON M,TOUVRON H,MISRA I,et al.Emerging properties in self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York,USA:IEEE,2021:9650-9660. [43]WU Z R,EFROS A A,YU S X.Improving generalization via scalable neighborhood component analysis[C]//Proceedings of the European Conference on Computer Vision.Munich,Germany:Springer,2018:685-701. [44]HAN T,XIE W,ZISSERMAN A.Self-supervised co-trainingfor video representation learning[J].Advances in Neural Information Processing Systems,2020,33:5679-5690. [45]CHEN Z,LIU H,GUO T Y,et al.Contrastive Learning fromSpatio-Temporal Mixed Skeleton Sequences for Self-Supervised Skeleton-Based Action Recognition[J].arXiv:2207.03065,2022. [46]LIU J,SHAHROUDY A,PEREZ M,et al.NTU RGB+D 120:A large-scale benchmark for 3d human activity understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,42(10):2684-2701. [47]THOKER F M,DOUGHTY H,SNOEK C G M.Skeleton-con-trastive 3D action representation learning[C]//Proceedings of the 29th ACM International Conference on Multimedia.Virtual Event,China:ACM,2021:1655-1663. [48]YANG S Y,LIU J,LU S J,et al.Skeleton cloud colorization for unsupervised 3d action representation learning[C]//Proceedings of the IEEE International Conference on Computer Vision.Montreal,Canada:IEEE,2021:13423-13433. [49]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605. | 
| 
 | ||