Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250700023-10.doi: 10.11896/jsjkx.250700023

• Information Security • Previous Articles     Next Articles

Black-box Physical Adversarial Attack Against Multimodal Object Detector

ZHENG Haibin1,2,3,4, LIN Xiuhao2, HAN Ye2, CHEN Jinyin1,2, LI Beibei3   

  1. 1 College of Computer Science,Zhejiang University of Technology,Hangzhou 310023,China
    2 College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China
    3 Key Laboratory of Data Protection and Intelligent Management,Ministry of Education,Sichuan University,Chengdu 610065,China
    4 Key Laboratory of Beijing Life Science Academy,Beijing 102206,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:ZHENG Haibin,born in 1995,Ph.D,lecturer.His main research interests include deep learning and artificial intelligence security.
    CHEN Jinyin,born in 1982,Ph.D,professor,is a member of CCF(No.14348M).Her main research interests include data mining,intelligent computing and complex network analysis.
  • Supported by:
    National Natural Science Foundation of China(62406286),Zhejiang Provincial Natural Science Foundation(LDQ23F020001),Key Laboratory of Data Protection and Intelligent Management,Ministry of Education,Sichuan University(SCUSAKFKT202402Z) and Beijing Life Science Academy(2024200CD0210).

Abstract: For real-world complex working conditions,deep learning-based multimodal(visible,infrared,etc.) target detectors improve the detection effect by fusing data features from different bands.However,it has been found that multimodal detectors are susceptible to adversarial attacks,resulting in the output detection frames being severely off-target or the detection frames disappearing,which reduces their reliability for use in the physical world.Work has been done to explore black-box physical adversarial attacks for multimodal depth detectors,but there are still problems such as inefficient modal attacks,limited target detectors and fusion strategies,and poor physical domain attack assignment.Aiming at the above problems,this paper proposes a multimodal adversarial color patch(MAC-Patch) generation method to achieve efficient,general,and robust attacks on multimodal deep detectors.Specifically,a stochastic gradient descent optimizer is utilized to generate strong adversarial patches against different modalities on the equivalent model,and can still effectively interfere with the target model in a black-box setting without accessing the internal structure of the target model.A patch location optimization method based on differential evolution is proposed to adaptively select the optimal attack location under multiple target fusion strategies,target detection models,and defense settings.Finally,the attack effectiveness,generalization and migration of MAC-Patch are tested on 2 models,3 image fusion strategies and 4 datasets respectively;the actual attack effect under different environment brightness and different patch rotation angles is adopted in the physical domain with expectation translation transformation to verify its robustness.Experimental results show that MAC-Patch is optimal in terms of attack success rate,AP reduction value and other indexes,such as compared with the three advanced attacks of MAP,MIC,and UAP,the AP reduction value of MAC-Patch is improved by 62.6%.

Key words: Object detection, Multimodality, Adversarial attack, Physical attack, Defense

CLC Number: 

  • TP391.4
[1] WANG X X,CHEN J,HE K,et al.A survey on adversarial attack and defense for object detection[J].Journal on Communications,2023,44(11):260-277.
[2] JONES J.Tesla,Ideal,Azure,Xiaopeng Intelligent TechnologyLayout Differences[EB/OL].https://news.qq.com/rain/a/20211012A0B9HX00.
[3] LI T H.Research on multimodal pedestrian recognition methods in harsh environments[D].Xi'an:Xi'an Technological University,2023.
[4] WEI X,YU J,HUANG Y.Infrared adversarial patches withlearnable shapes and locations in the physical world [J].International Journal of Computer Vision,2024,132:1928-1944.
[5] CHENG Y H,SHI W W,TIAN L.Adversarial color projection:A projector-based physical-world attack to DNNs[J].Image and Vision Computing,2023,140:104861.
[6] ZHU X P,HU Z H,HUANG S Y,et al.Infrared invisible clothing hiding from infrared detectors at multiple angles in the real world[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Piscataway,NJ:IEEE,2022:13317-13326.
[7] REDMON J,FARHADI A.YOLOv3:an incremental improvement[J].arXiv:1804.02767,2018.
[8] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[C]//Proceedings of Advances in Neural Information Processing Systems(NeurIPS).Curran Associates Inc.,2015:91-99.
[9] LI H,WU X.DenseFuse:A fusion approach to infrared and visible images[J].IEEE Transactions on Image Processing,2019,28(5):2614-2623.
[10] LIU J,FAN X,HUANG Z,et al.Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:5792-5801.
[11] WANG Z,CHEN Y,SHAO W,et al.SwinFuse:A residual swin transformer fusion network for infrared and visible images[J].IEEE Transactions on Instrumentation and Measurement,2022,71:1-12.
[12] LI G,XU Y,DING J,et al.Toward generic and controllable attacks against object detection [J].IEEE Transactions on Geoscience and Remote Sensing,2024,62:1-12.
[13] AMIRA G,RUITIAN D,MUHAMMAD A H,et al.A dynamic adversarial patch for evading person detectors[J].arXiv:2305.11618,2023.
[14] WANG Y,LI X,YANG L,et al.Adaptive oriented adversarial attacks on visible and infrared image fusion models [C]//2024 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2024:1-6.
[15] HUANG Y,DONG Y,RUAN S,et al.Towards transferabletargeted 3d adversarial attack in the physical world[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:24512-24522.
[16] GOODFELLOW I J,SHLENS J,SZEGERDY C.Explaining and harnessing adversarial examples[C]//International Conference on Learning Representations(ICLR),2015.
[17] SHAFAHI A,HUANG W R,NAJIBI M,et al.Are adversarial examples inevitable?[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).Springer,2019:3407-3416.
[18] HWANG S,PARK J,KIM N,et al.Multispectral pedestrian detection:Benchmark dataset and baselines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE Computer Society,2018:5386-5394.
[19] JIA X,ZHU C,LI M,et al.LLVIP:A visible-infrared paired dataset for low-light vision[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).Springer,2021:2380-2389.
[20] LIU J,FAN X,HUANG Z,et al.Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE Computer Society,2021:2379-2388.
[21] XU H,MA J,LE Z,et al.FusionDN:A unified densely connected network for image fusion[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12484-12491.
[22] WEI X X,HUANG Y,SUN Y T,et al.Unified adversarial patch for cross-modal attacks in the physical world[J].arXiv:2307.07859,2023.
[23] KIM T,LEE H J,RO Y M.MAP:Multispectral adversarialpatch to attack person detection[C]//Proceedings of the 2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).2022:4853-4857.
[24] KIM T,YU Y,RO Y M.Multispectral invisible coating:Laminated visible-thermal physical attack against multispectral object detectors using transparent low-e films[C]//Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence.AAAI,2023:1151-1159.
[1] SHAN Chengcheng, MEI Chun, LI Weiting, GUO Yuanyuan, QIAN Weixing, XIONG Zhi. Semantic Perception Active Learning Method for the Datum Map of Scene Matching Navigation System [J]. Computer Science, 2026, 53(6A): 250600228-8.
[2] CHEN Nuo, ZHAO Peng, HUAN Haisheng. Review of Small Object Detection Based on Deep Learning [J]. Computer Science, 2026, 53(6A): 250700022-9.
[3] QU Jiewu, LU Xinxi, SUN Jian, LIU Yan, GAO Ling, XU Binbin. Object Detection Method Based on Phased Training Strategy and Multi-scale Feature Fusion [J]. Computer Science, 2026, 53(6A): 250700088-7.
[4] DONG Ye, LIAN Xinyue, WANG Yuyang, OU Xinyu. RGB-IR Multi-modal Fusion-based Tomato Small Object Detection [J]. Computer Science, 2026, 53(6A): 250700173-8.
[5] ZHOU Wenwu, LEI Lei, XUAN Xin. Armory Equipment Detection Based on Improved YOLOv5 [J]. Computer Science, 2026, 53(6A): 250800049-6.
[6] MAO Lihong, TANG Jianjun, CHEN Tong, ZHANG Rui. Aerial Image Object Detection Model Based on Dual-domain Attention and Feature Fusion [J]. Computer Science, 2026, 53(6A): 250600036-7.
[7] ZHANG Shouyi, SHEN Qiang, GUO Yiran, WANG Hanyu. Rain and Fog Weather Object Detection Algorithm Based on Improved YOLOv8 Model [J]. Computer Science, 2026, 53(6A): 250300090-7.
[8] LIU Dai, AN Pengyu, WANG Kai. Improved YOLOv5s-based Algorithm for Emergency Situation Detection in Airport Terminals [J]. Computer Science, 2026, 53(6A): 250300174-7.
[9] JI Wenyu, LI Yang, WANG Jiabao, FU Ruizhi, LIU Xiaoyu, MIAO Zhuang. Review of 3D Object Detection Based on LiDAR-camera Fusion [J]. Computer Science, 2026, 53(6): 214-231.
[10] LI Peng, ZHANG Zihao, HAN Yahong. Primitive Dynamic Weighting for Multi-modal Salient Object Detection [J]. Computer Science, 2026, 53(6): 242-251.
[11] LIU Jikang, HUANG Lei, ZHANG Ke, NIE Jie, WEI Zhiqiang. Object Detection Method Based on Dynamic Feature Fusion [J]. Computer Science, 2026, 53(6): 263-269.
[12] CHEN Jun, TAO Wei, BAO Lei, TAO Qing. Momentum Method with Monotonical Coordinate-wise Step-sizes for Adversarial Attacks [J]. Computer Science, 2026, 53(5): 426-434.
[13] CUI Tao, SHEN Junxia, CHEN Lin, ZHANG Yuntao, CHEN Monan. Technologies for Evaluating Defense Effectiveness of Endogenous Security Information Systems Based onAttack Graphs [J]. Computer Science, 2026, 53(5): 435-445.
[14] SONG Jianhua, LIU Chun, ZHANG Yan. Lightweight Camouflaged Object Detection Model Based on Structured Knowledge Distillation [J]. Computer Science, 2026, 53(4): 299-307.
[15] ZHAO Binbei, ZHU Li, ZHAO Hongli, LI Yutong. Computer Vision Applications in Rail Transit Systems [J]. Computer Science, 2026, 53(3): 214-224.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!