针对视频识别模型的边界黑盒对抗样本生成算法

doi:10.11896/jsjkx.240700045

Abstract

Abstract: With the rapid development of deep learning,neural networks are widely used in various fields.However,neural networks still face the problem of adversarial attacks.Among all types of adversarial attacks,the boundary black-box attack can only obtain the final classification label of the tested model,so it is closest to the actual application scenario,and is recognized as the most practical and difficult attacks,which has attracted more and more researchers to conduct related research.Nevertheless,current relevant research mainly focus on image recognition models,and with less research on video recognition models.To this end,this paper proposes a boundary black-box video adversarial example generation algorithm BBVA.BBVA uses a progressive exploration mechanism to generate adversarial videos,which effectively improves the efficiency of updating samples.Experiments show that compared with the state-of-the-art boundary black-box video adversarial example generation algorithm STDE,BBVA better balances the noise size and model queries,and gets the best results in this research field in many measurement indicators such as visual effect,optimization distance and fooling rate.In addition,under more severe conditions,BBVA even outperforms some state-of-the-art score-based black-box video adversarial example generation algorithms,such as EARL and VBAD.The proposed algorithm can be used to provide adversarial training samples to enhance video model security.

Key words: Adversarial example,Video recognition,Boundary,Black-box,Neural networks

CLC Number:

TP391

JING Yulin, WU Lijun, LI Zhiyuan, DENG Qi. Boundary Black-box Adversarial Example Generation Algorithm on Video Recognition Models[J].Computer Science, 2025, 52(10): 366-373.

References

[1]KARPATHY A,TODERICI G,SHETT Y,et al.Large-Scale Video Classification with Convolutional Neural Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.2014:1725-1732.
[2]CARREIR A,JOÃ O,ZISSERMA N,et al.Quo Vadis,ActionRecognition? A New Model and the Kinetics Dataset [C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4724-4733.
[3]WU Z X,JIANG Y G,WANG X,et al.Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification [C]//Proceedings of the 24th ACM International Conference on Multimedia.2016:791-800.
[4]ZHANG X,WU Z X,WENG Z J,et al.VideoLT:Large-Scale Long-Tailed Video Recognition [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).2021:7960-7969.
[5]YANG Z W,HAN Y H,WANG Z,et al.Catching the Temporal Regions-of-Interest for Video Captioning[C]//Proceedings of the 25th ACM International Conference on Multimedia.2017:146-153.
[6]LIU S,REN Z,YUAN J S,et al.SibNet:Sibling Convolutional Encoder for Video Captioning[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2021:3259-3272.
[7]NILSSON D,SMINCHISESCU C.Semantic Video Segmenta-tion by Gated Recurrent Flow Propagation [C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:6819-6828.
[8]WANG W G,SONG H M,ZHAO S Y. Learning Unsupervised Video Object Segmentation Through Visual Attention [C]//IEEE Conference on Computer Vision and Pattern Recognition.2019:3059-3069.
[9]WEI X X,ZHU J,YUAN S,et al.Sparse Adversarial Perturbations for Videos [C]//AAAI Conference on Artificial Intelligence.2019:1101.
[10]WEI Z P,CHEN J J,WU Z X,et al.Boosting the Transferability of Video Adversarial Examples via Temporal Translation [C]//AAAI Conference on Artificial Intelligence.2021:239016118.
[11]WEI Z,CHEN J,WU Z,et al.Cross-Modal Transferable Adversarial Attacks from Images to Videos [C]//IEEE Conference on Computer Vision and Pattern Recognition.2022:15044-15053.
[12]LI S S,AJAYA N,PAUL S,et al.Adversarial PerturbationsAgainst Real-Time Video Classification Systems [J].arXiv:1807.00458,2018.
[13]CHRISTIAN S,WOJCIECH Z,ILYA S,et al.Intriguing pro-perties of neural networks [C]//International Conference on Learning Representations.2014.
[14]CARLINI N,WAGNER D.Towards Evaluating the Ro-bustness of Neural Networks [C]//IEEE Symposium on Security and Privacy.2017:2375-1207.
[15]GOODFELLOW I J,JONATHON S,CHRISTIAN S,et al.Explaining and Harnessing Adversarial Examples [C]//International Conference on Learning Representations.2015.
[16]ALEXEY K,GOODFELLOW I J,SAMY B,et al.Adversarial Machine Learning at Scale [C]//International Conference on Learning Representations.2017.
[17]ALEKSANDER M,ALEKSANDAR M,LUDWIG S,et al.Towards Deep Learning Models Resistant to Adversarial Attacks [C]//International Conference on Learning Representations.2018.
[18]BHAGOJI A N,HE W,LI B,et al.Practical Black-Box Attacks on Deep Neural Networks Using Efficient Query Mechanisms[C]//ECCV.2018:158-174.
[19]CHEN P Y,ZHANG H,SHARMA Y Y,et al.ZOO:Zeroth Order Optimization Based Black-Box Attacks to Deep Neural Networks without Training Substitute Models [C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.2017:15-26.
[20]ILYAS A,ENGSTROM L,ATHALYE A,et al.Black-box Adversarial Attacks with Limited Queries and Information [C]//International Conference on Machine Learning.2018:5046541.
[21]CHEN J B,JORDAN M I,WAINWRIGHT M.HopSkipJumpAttack:A Query-Efficient Decision-Based Attack [C]//IEEE Symposium on Security and Privacy(SP).2020:1277-1294.
[22]JIANG K,CHEN Z,HUANG H,et al.Efficient Decision-based Black-box Patch Attacks on Video Recognition [C]//International Conference on Computer Vision.2023:4356-4366.
[23]JIANG L X,MA X J,CHEN S X,et al.Black-Box Adversarial Attacks on Video Recognition Models [C]//ACM International Conference on Multimedia.2019:864-872.
[24]YAN H Q,WEI X X.Efficient Sparse Attacks on Videos Using Reinforcement Learning [C]//ACM International Conference on Multimedia.2021:2326-2334.
[25]ZHGNA J,LI L,LI H,et al.Progressive-scale boundary blackbox attack via projective gradient estimation [C]//International Conferenceon Machine Learning.2021:235417051.
[26]LI H C,XU X J,ZHANG X L,et al.QEBA:Query-Efficient Boundary-Based Blackbox Attack[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:1218-1227.
[27]LI H C,LI L Y,XU X J,et al.Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks [C]//International Conference on Artificial Intelligence and Statistics.2021.
[28]WANG R K,GUO Y F,WANG Y H,et al.Global-local characteristic excited cross-modal attacks from images to videos [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:2635-2643.
[29]CHEN K,WEI Z P,CHEN J J,et al.GCMA:Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos [C]//ACM International Conference on Multimedia.2023:698-708.
[30]KUEHNE H,JHUANG H,GARROTE E,et al.HMDB:Alarge video database for human motion recognition [C]//International Conference on Computer Vision.2011:2556-2563.
[31]KHURRAM S,AMIR ROSHAN Z,MUBARAK S,et al.UCF101:A Dataset of 101 Human Actions Classes From Videos in The Wild [J].arXiv:1212.0402,2012.
[32]HARA K,KATAOKA H,SATOH Y,et al.Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet [C]//Conference on Computer Vision and Pattern Recognition.2018:6546-6555.
[33]WANG L M,XIONG Y J,WANG Z,et al.Temporal Segment Networks:Towards Good Practices for Deep Action Recognition [C]//International Conference on Computer Vision.2016:20-36.
[34]DIEDERIK P K,BA L J.Adam:A Method for Stochastic Optimization [C]//International Conference on Learning Representations.2015.
[35]SASHANK J R,SATYEN K,SANJIV K,et al.On the Convergence of Adam and Beyond [C]//International Conference on Learning Representations.2018.
[36]ZHANG M R,LUCAS J,HINTON G,et al.Lookahead optimizer:k steps forward,1 step back [C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:9597-9608.

Related Articles 15

[1]	WANG Baocai, WU Guowei. Interpretable Credit Risk Assessment Model:Rule Extraction Approach Based on AttentionMechanism [J]. Computer Science, 2025, 52(10): 50-59.
[2]	ZHENG Hanyuan, GE Rongjun, HE Shengji, LI Nan. Direct PET to CT Attenuation Correction Algorithm Based on Imaging Slice Continuity [J]. Computer Science, 2025, 52(10): 115-122.
[3]	XU Hengyu, CHEN Kun, XU Lin, SUN Mingzhai, LU Zhou. SAM-Retina:Arteriovenous Segmentation in Dual-modal Retinal Image Based on SAM [J]. Computer Science, 2025, 52(10): 123-133.
[4]	WEN Jing, ZHANG Songsong, LI Xufeng. Target Tracking Method Based on Cross Scale Fusion of Features and Trajectory Prompts [J]. Computer Science, 2025, 52(10): 144-150.
[5]	SHENG Xiaomeng, ZHAO Junli, WANG Guodong, WANG Yang. Immediate Generation Algorithm of High-fidelity Head Avatars Based on NeRF [J]. Computer Science, 2025, 52(10): 159-167.
[6]	ZHENG Dichen, HE Jikai, LIU Yi, GAO Fan, ZHANG Dengyin. Low Light Image Adaptive Enhancement Algorithm Based on Retinex Theory [J]. Computer Science, 2025, 52(10): 168-175.
[7]	RUAN Ning, LI Chun, MA Haoyue, JIA Yi, LI Tao. Review of Quantum-inspired Metaheuristic Algorithms and Its Applications [J]. Computer Science, 2025, 52(10): 190-200.
[8]	XIONG Zhuozhi, GU Zhouhong, FENG Hongwei, XIAO Yanghua. Subject Knowledge Evaluation Method for Language Models Based on Multiple ChoiceQuestions [J]. Computer Science, 2025, 52(10): 201-207.
[9]	WANG Jian, WANG Jingling, ZHANG Ge, WANG Zhangquan, GUO Shiyuan, YU Guiming. Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory [J]. Computer Science, 2025, 52(10): 208-216.
[10]	CHEN Yuyan, JIA Jiyuan, CHANG Jingwen, ZUO Kaiwen, XIAO Yanghua. SPEAKSMART:Evaluating Empathetic Persuasive Responses by Large Language Models [J]. Computer Science, 2025, 52(10): 217-230.
[11]	LI Sihui, CAI Guoyong, JIANG Hang, WEN Yimin. Novel Discrete Diffusion Text Generation Model with Convex Loss Function [J]. Computer Science, 2025, 52(10): 231-238.
[12]	ZHANG Jiawei, WANG Zhongqing, CHEN Jiali. Multi-grained Sentiment Analysis of Comments Based on Text Generation [J]. Computer Science, 2025, 52(10): 239-246.
[13]	CHEN Jiahao, DUAN Liguo, CHANG Xuanwei, LI Aiping, CUI Juanjuan, HAO Yuanbin. Text Sentiment Classification Method Based on Large-batch Adversarial Strategy and EnhancedFeature Extraction [J]. Computer Science, 2025, 52(10): 247-257.
[14]	WANG Ye, WANG Zhongqing. Text Simplification for Aspect-based Sentiment Analysis Based on Large Language Model [J]. Computer Science, 2025, 52(10): 258-265.
[15]	ZHAO Jinshuang, HUANG Degen. Summary Faithfulness Evaluation Based on Data Augmentation and Two-stage Training [J]. Computer Science, 2025, 52(10): 266-274.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Boundary Black-box Adversarial Example Generation Algorithm on Video Recognition Models

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0