计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 366-373.doi: 10.11896/jsjkx.240700045

• 信息安全 • 上一篇    下一篇

针对视频识别模型的边界黑盒对抗样本生成算法

荆瑜琳, 吴立军, 李志圆, 邓棋   

  1. 电子科技大学计算机科学与工程学院 成都 611731
  • 收稿日期:2024-07-08 修回日期:2025-03-01 出版日期:2025-10-15 发布日期:2025-10-14
  • 通讯作者: 吴立军(wljuestc@sina.com)
  • 作者简介:(jingyulin@std.uestc.edu.cn)

Boundary Black-box Adversarial Example Generation Algorithm on Video Recognition Models

JING Yulin, WU Lijun, LI Zhiyuan, DENG Qi   

  1. School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China
  • Received:2024-07-08 Revised:2025-03-01 Online:2025-10-15 Published:2025-10-14
  • About author:JING Yulin,born in 1989,postgra-duate.His main research interests include artificial intelligence security and computer vision.
    WU Lijun,born in 1965, professor.His main research interests include artificial intelligence and information security.

摘要: 随着深度学习的快速发展,神经网络在各个领域广泛应用。然而,当前神经网络仍然面临着对抗样本攻击的困扰。在所有类型的对抗样本攻击中,边界黑盒攻击只能获取被测试模型的最终分类标签,因此其最接近实际应用场景,被公认为最具有现实意义且最难实现的攻击,吸引了越来越多研究者的关注。但目前相关研究主要聚焦于图片识别模型,在视频识别模型方面的研究较少。为此,提出了一种基于边界的黑盒视频对抗样本生成算法BBVA。BBVA采用了一种渐进式探索机制生成视频对抗样本,有效提高了样本生成效率。实验表明,与最新的边界黑盒视频对抗样本生成算法STDE相比,BBVA较好地权衡了噪声大小和模型访问次数,在视觉效果、优化距离和欺骗率等多项衡量指标中均达到了该研究领域目前最优水平;此外,在条件更为苛刻的情况下,BBVA甚至优于一些最新的基于分数的黑盒视频对抗样本生成算法,如EARL和VBAD。所提算法可用于提供对抗训练样本,从而提升视频模型的安全性。

关键词: 对抗样本, 视频识别, 边界, 黑盒, 神经网络

Abstract: With the rapid development of deep learning,neural networks are widely used in various fields.However,neural networks still face the problem of adversarial attacks.Among all types of adversarial attacks,the boundary black-box attack can only obtain the final classification label of the tested model,so it is closest to the actual application scenario,and is recognized as the most practical and difficult attacks,which has attracted more and more researchers to conduct related research.Nevertheless,current relevant research mainly focus on image recognition models,and with less research on video recognition models.To this end,this paper proposes a boundary black-box video adversarial example generation algorithm BBVA.BBVA uses a progressive exploration mechanism to generate adversarial videos,which effectively improves the efficiency of updating samples.Experiments show that compared with the state-of-the-art boundary black-box video adversarial example generation algorithm STDE,BBVA better balances the noise size and model queries,and gets the best results in this research field in many measurement indicators such as visual effect,optimization distance and fooling rate.In addition,under more severe conditions,BBVA even outperforms some state-of-the-art score-based black-box video adversarial example generation algorithms,such as EARL and VBAD.The proposed algorithm can be used to provide adversarial training samples to enhance video model security.

Key words: Adversarial example,Video recognition,Boundary,Black-box,Neural networks

中图分类号: 

  • TP391
[1]KARPATHY A,TODERICI G,SHETT Y,et al.Large-Scale Video Classification with Convolutional Neural Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.2014:1725-1732.
[2]CARREIR A,JOÃ O,ZISSERMA N,et al.Quo Vadis,ActionRecognition? A New Model and the Kinetics Dataset [C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4724-4733.
[3]WU Z X,JIANG Y G,WANG X,et al.Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification [C]//Proceedings of the 24th ACM International Conference on Multimedia.2016:791-800.
[4]ZHANG X,WU Z X,WENG Z J,et al.VideoLT:Large-Scale Long-Tailed Video Recognition [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).2021:7960-7969.
[5]YANG Z W,HAN Y H,WANG Z,et al.Catching the Temporal Regions-of-Interest for Video Captioning[C]//Proceedings of the 25th ACM International Conference on Multimedia.2017:146-153.
[6]LIU S,REN Z,YUAN J S,et al.SibNet:Sibling Convolutional Encoder for Video Captioning[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2021:3259-3272.
[7]NILSSON D,SMINCHISESCU C.Semantic Video Segmenta-tion by Gated Recurrent Flow Propagation [C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:6819-6828.
[8]WANG W G,SONG H M,ZHAO S Y. Learning Unsupervised Video Object Segmentation Through Visual Attention [C]//IEEE Conference on Computer Vision and Pattern Recognition.2019:3059-3069.
[9]WEI X X,ZHU J,YUAN S,et al.Sparse Adversarial Perturbations for Videos [C]//AAAI Conference on Artificial Intelligence.2019:1101.
[10]WEI Z P,CHEN J J,WU Z X,et al.Boosting the Transferability of Video Adversarial Examples via Temporal Translation [C]//AAAI Conference on Artificial Intelligence.2021:239016118.
[11]WEI Z,CHEN J,WU Z,et al.Cross-Modal Transferable Adversarial Attacks from Images to Videos [C]//IEEE Conference on Computer Vision and Pattern Recognition.2022:15044-15053.
[12]LI S S,AJAYA N,PAUL S,et al.Adversarial PerturbationsAgainst Real-Time Video Classification Systems [J].arXiv:1807.00458,2018.
[13]CHRISTIAN S,WOJCIECH Z,ILYA S,et al.Intriguing pro-perties of neural networks [C]//International Conference on Learning Representations.2014.
[14]CARLINI N,WAGNER D.Towards Evaluating the Ro-bustness of Neural Networks [C]//IEEE Symposium on Security and Privacy.2017:2375-1207.
[15]GOODFELLOW I J,JONATHON S,CHRISTIAN S,et al.Explaining and Harnessing Adversarial Examples [C]//International Conference on Learning Representations.2015.
[16]ALEXEY K,GOODFELLOW I J,SAMY B,et al.Adversarial Machine Learning at Scale [C]//International Conference on Learning Representations.2017.
[17]ALEKSANDER M,ALEKSANDAR M,LUDWIG S,et al.Towards Deep Learning Models Resistant to Adversarial Attacks [C]//International Conference on Learning Representations.2018.
[18]BHAGOJI A N,HE W,LI B,et al.Practical Black-Box Attacks on Deep Neural Networks Using Efficient Query Mechanisms[C]//ECCV.2018:158-174.
[19]CHEN P Y,ZHANG H,SHARMA Y Y,et al.ZOO:Zeroth Order Optimization Based Black-Box Attacks to Deep Neural Networks without Training Substitute Models [C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.2017:15-26.
[20]ILYAS A,ENGSTROM L,ATHALYE A,et al.Black-box Adversarial Attacks with Limited Queries and Information [C]//International Conference on Machine Learning.2018:5046541.
[21]CHEN J B,JORDAN M I,WAINWRIGHT M.HopSkipJumpAttack:A Query-Efficient Decision-Based Attack [C]//IEEE Symposium on Security and Privacy(SP).2020:1277-1294.
[22]JIANG K,CHEN Z,HUANG H,et al.Efficient Decision-based Black-box Patch Attacks on Video Recognition [C]//International Conference on Computer Vision.2023:4356-4366.
[23]JIANG L X,MA X J,CHEN S X,et al.Black-Box Adversarial Attacks on Video Recognition Models [C]//ACM International Conference on Multimedia.2019:864-872.
[24]YAN H Q,WEI X X.Efficient Sparse Attacks on Videos Using Reinforcement Learning [C]//ACM International Conference on Multimedia.2021:2326-2334.
[25]ZHGNA J,LI L,LI H,et al.Progressive-scale boundary blackbox attack via projective gradient estimation [C]//International Conferenceon Machine Learning.2021:235417051.
[26]LI H C,XU X J,ZHANG X L,et al.QEBA:Query-Efficient Boundary-Based Blackbox Attack[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:1218-1227.
[27]LI H C,LI L Y,XU X J,et al.Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks [C]//International Conference on Artificial Intelligence and Statistics.2021.
[28]WANG R K,GUO Y F,WANG Y H,et al.Global-local characteristic excited cross-modal attacks from images to videos [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:2635-2643.
[29]CHEN K,WEI Z P,CHEN J J,et al.GCMA:Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos [C]//ACM International Conference on Multimedia.2023:698-708.
[30]KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:Alarge video database for human motion recognition [C]//International Conference on Computer Vision.2011:2556-2563.
[31]KHURRAM S,AMIR ROSHAN Z,MUBARAK S,et al.UCF101:A Dataset of 101 Human Actions Classes From Videos in The Wild [J].arXiv:1212.0402,2012.
[32]HARA K,KATAOKA H,SATOH Y,et al.Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet [C]//Conference on Computer Vision and Pattern Recognition.2018:6546-6555.
[33]WANG L M,XIONG Y J,WANG Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition [C]//International Conference on Computer Vision.2016:20-36.
[34]DIEDERIK P K,BA L J.Adam:A Method for Stochastic Optimization [C]//International Conference on Learning Representations.2015.
[35]SASHANK J R,SATYEN K,SANJIV K,et al.On the Convergence of Adam and Beyond [C]//International Conference on Learning Representations.2018.
[36]ZHANG M R,LUCAS J,HINTON G,et al.Lookahead optimizer:k steps forward,1 step back [C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:9597-9608.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!