计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 251-258.doi: 10.11896/jsjkx.240900127
王佳, 夏英, 丰江帆
WANG Jia, XIA Ying, FENG Jiangfan
摘要: 小样本视频行为识别旨在利用有限的训练样本构建高效学习模型,从而减轻传统行为识别对大规模且精细标注数据集的依赖。目前,小样本学习模型大多依据视频之间的相似性进行分类,但不同的动作实例呈现出不同的时空分布,导致查询视频与支持视频之间出现时间错位和动作演化错位,从而影响模型的识别性能。针对此问题,提出两阶段时空对齐网络TSAN,以提高视频数据的对齐精度,进而提升小样本视频行为识别的准确率。该网络采用元学习的基本架构,第一阶段通过动作时间对齐模块ATAM,构建元组模式的视频帧对,将视频动作细分为子动作序列,并结合视频数据中的时序信息,提升小样本学习的效率;第二阶段通过动作演化对齐模块AEAM,及其中包含的时间同步子模块TSM和空间协调子模块SCM,对查询特征进行校准,以匹配支持集的时空动作演化,从而提高小样本视频行为识别的准确率。在HMDB51,UCF101,SSV2100和Kinetics100这4个数据集上的实验结果表明,TSAN网络相较于现有小样本视频行为识别方法,具有更高的识别准确率。
中图分类号:
[1]SHENG X X,LI K C,SHEN Z Q,et al.A Progressive Difference Method for Capturing Visual Tempos on Action Recognition[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(3):977-987. [2]COSKUN H,ZIA Z,TEKIN B,et al.Domain-Specific Priors and Meta Learning for Few-Shot First-Person Action Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(6):6659-6673. [3]WANG P,LI H B,ZHANG B X,et al.Metric-based few-shotlearning method for driver distracted behaviors detection[C]//2023 International Conference on Image Processing Computer Vision and Machine Learning(ICICML).IEEE,2023:959-963. [4]ZHU L,YI Y.Compound Memory Networks for Few-Shot Video Classification[C]//Proceedings of the European Conference on Computer Vision.2018:751-766. [5]BISHAY M,ZOUMPOURLIS G,PATRAS I.TARN:Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition[C]//British Machine Vision Conference.2019:154-168. [6]ZHANG H,ZHANG L,QI X,et al.Few-shot Action Recognition with Permutation-invariant Attention[C]//Proceedings of the European Conference on Computer Vision.2020:525-542. [7]CAO K,JI J,CAO Z,et al.Few-Shot Video Classification viaTemporal Alignment[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020:10615-10624. [8]FU Y,ZHANG L,WANG J,et al.Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition[C]//Proceedings of the 28th ACM International Conference on Multimedia.ACM,2020:1142-1151. [9]NI X Z,WEN H,LIU Y,et al.Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition[C]//Proceedings of the 2024 International Conference on Multimedia Retrieval.ACM,2020:1-10. [10]DWIVEDI S K,GUPTA V,MITRA R,et al.ProtoGAN:Towards Few Shot Learning for Action Recognition[C]//International Conference on Computer Vision Workshop(ICCVW).IEEE,2019:1308-1316. [11]ZHU X,TOISOUL A,PEREZ-RUA J M,et al.Few-shot Action Recognition with Prototype-centered Attentive Learning[C]//British Machine Vision Conference.2021:249-259. [12]WANG X,ZHANG S,QING Z,et al.MoLo:Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition[C]//Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2023:18011-18021. [13]WANG X,ZHANG S W,CEN J,et al.CLIP-guided Prototype Modulatingfor Few-shot Action Recognition[J].International Journal of Computer Vision,2024,132(6):1899-1912. [14]LI S,LIU H,QIAN R,et al.TA2N:Two-Stage Action Alignment Network for Few-shot Action Recognition[J].Proceedings of the AAAI Conference on Artificial Intelligence,2022:36(2):1404-1411. [15]PERRETT T,MASULLO A,BURGHARDT T,et al.Temporal-RelationalCrossTransformers for Few-Shot Action Recognition[C]//Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2021:475-484. [16]THATIPELLI A,NARAYAN S,KHAN S,et al.Spatio-temporal Relation Modeling for Few-shot Action Recognition[C]//Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2022:19926-19935. [17]GUO F,ZHU L,WANG Y K,et al.Task-Specific Alignment and Multiple Level Transformer for Few-Shot Action Recognition[J].Neurocomputing,2024,32(5):598-612. [18]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2016:770-778. [19]HAKIM N,INSAF B,HASSAN S.Improving Human ActionRecognition in Videos with Two-Stream and Self-Attention Module[C]//Colloquium in Information Science and Technology.IEEE,2023:215-220. [20]WANG L,XIONG Y,WANG Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition[C]//Proceedings of the European Conference on Computer Vision.2016:20-36. |
|