Computer Science ›› 2025, Vol. 52 ›› Issue (2): 173-182.doi: 10.11896/jsjkx.240300068

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Illumination-aware Infrared/Visible Fusion for Object Detection

CHENG Qinghua1,2, JIAN Haifang1, ZHENG Shuaikang1, GUO Huimin1,2, LI Yuehao1,2   

  1. 1 Laboratory of Solid State Optoelectronics Information Technology,Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083, China
    2 College of Materials Science and Opto-Electronic Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2024-03-11 Revised:2024-07-18 Online:2025-02-15 Published:2025-02-17
  • About author:CHENG Qinghua,born in 1999,postgraduate.His main research interests include multi-spectral image processing algorithms and so on.
    JIAN Haifang,born in 1978,Ph.D,researcher,is a member of CCF(No.O2087M).His main research interests include intelligent information proces-sing algorithms and systems,and so on.
  • Supported by:
    Progress of Strategy Priority Research Program of Chinese Academy of Sciences(XDB0460000).

Abstract: The method based on infrared/visible light fusion can effectively improve the effect of target detection in open scena-rios such as road traffic and security monitoring.The existing methods rarely design feature interaction mechanisms for infrared/visible light differences,which limits the accuracy and robustness of detection.Therefore,this paper designs an infrared/visible image fusion network based on dual-stream structure,which fully considers the differences between different modal images,and realizes accurate target recognition in the open environment by extracting and fusing the multi-level feature information of diffe-rent modal images.In order to solve the problem that the quality of visible image is easily affected by the change of ambient illumination,a lightweight illumination-aware module is designed,and the weight of infrared/visible fusion is dynamically adjusted through the weight allocation function,so as to improve the adaptability and accuracy of the fusion algorithm.At the same time,a parameter-free 3D attention module is designed to automatically identify the channel and spatial importance of the extracted features of the network,and different fusion weights are assigned according to the importance of different modes,which can improve the effect of network fusion without increasing the number of parameters of the network.In addition,this paper constructs a set of near-infrared/visible light datasets(NRS),which provides more source data for target recognition tasks in open scenes.Finally,a series of tests are carried out on the self-constructed dataset NRS and the public dataset M3FD,and the results show that the detection accuracy of the proposed method reaches 93.5% and 92.2%(mAP.50) respectively,which can provide support for accurate target detection and recognition in open scenes.

Key words: Object detection, Multi-spectral image fusion, Near-infrared image, Illumination-aware, Attention mechanism

CLC Number: 

  • TP391
[1]ZHAO T,NING X,HONG K,et al.Ada3D:Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection[C]//IEEE/CVF International Conference on Computer Vision.New York:IEEE,2023:17682-17692.
[2]LIX X,QIANG J,LIU W J,et al.Research on Traffic Object Detection Method in Fog Based on Dual Backbone Network[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2023,40(4):25-34.
[3]WANG C Y,BOCHKOVSKIY A,LIAO H Y M,et al.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time objectdetectors[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE,2023:7464-7475.
[4]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint Triplets for Object Detection[C]//IEEE/CVF International Conference on Computer Vision.New York:IEEE,2019:6568-6577.
[5]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:779-788.
[6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//29th Annual Conference on Neural Information Proces-sing Systems.San Diego:NIPS,2015:1137-1149.
[7]CAO J,PANG Y,XIE J,et al.From Handcrafted to Deep Features for Pedestrian Detection:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(9):4913-4934.
[8]ZHOU L,CHEN Z.Illumination-aware window transformer for RGBT modality fusion[J].Journal of Visual Communication and Image Representation,2023,90(2):1-10.
[9]CAO Y,LUO X,YANG J,et al.Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection[J].Information Fusion,2022,88:1-11.
[10]SONG X,GAO S,CHEN C.A multispectral feature fusion network for robust pedestrian detection[J].Alexandria Engineering Journal,2021,60(1):73-85.
[11]ZHOU K L,CHEN L S,XUN C.Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems[C]//ECCV 2020.German:Springer,2020:787-803.
[12]CUI Z Y,ZHOU J H,PENG Y Y.Survey on Cross-modalityObject Re-identification Research[J].Computer Science,2024,51(1):13-25.
[13]LIU J J,ZHANG S T,WANG S,et al.Multispectral Deep Neural Networks for Pedestrian Detection[C]//Proceedings of the British Machine Vision Conference 2016.UK:BMVA Press,2016:1-13.
[14]LIU T,LAM K M,ZHAO R,et al.Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(1):315-329.
[15]MA J,TANG L,XU M,et al.STDFusionNet:An Infrared and Visible Image Fusion Network Based on Salient Target Detection[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-13.
[16]GOLCARENARENJI G,MARTINEZ-ALPISTE I,WANG Q,et al.Illumination-aware image fusion for around-the-clock human detection in adverse environments from Unmanned Aerial Vehicle[J].Expert Systems with Applications,2022,204(15):117413-117425.
[17]LI C,SONG D,TONG R,et al.Illumination-aware fasterR-CNN for robust multispectral pedestrian detection[J].Pattern Recognition,2019,85:161-171.
[18]GUAN D,CAO Y,YANG J,et al.Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection[J].Information Fusion,2019,50:148-157.
[19]WANG J H,ZHONG X,LI W X,et al.Human Activity Recognition with Meta-learning and Attention[J].Computer Science,2023,50(8):193-201.
[20]HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2023.
[21]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//15th European Conference on Computer Vision.German:Springer,2018:3-19.
[22]JIN S,YU B,JING M,et al.DarkVisionNet:Low-Light Imaging via RGB-NIR Fusion with Deep Inconsistency Prior[C]//36th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2022:1104-1112.
[23]HE R,WU X,SUN Z,et al.Learning Invariant Deep Representation for NIR-VIS Face Recognition[C]//31st AAAI Confe-rence on Artificial Intelligence.Menlo Park:AAAI,2017:2000-2006.
[24]REALE C,NASRABADI N M,KWON H,et al.Seeing the Fo-rest from the Trees:A Holistic Approach to Near-infrared He-terogeneous Face Recognition[C]//29th IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:320-328.
[25]MAERK J,KARNER M,ANDRE M,et al.Online Process Control of a Pharmaceutical Intermediate in a Fluidized-Bed Drier Environment Using Near-Infrared Spectroscopy[J].Analytical Chemistry,2010,82(10):4209-4215.
[26]TOET A.TNO Image FusionDataset(Version2.0)[EB/OL].https://doi.org/10.6084/m9.figshare.1008029.v2.
[27]DAVIS J W,KECK M A.A two-stage template approach toperson detection in thermal imagery[C]//7th IEEE Workshop on Applications of Computer Vision.New York:IEEE,2005:364-369.
[28]LIU J,FAN X,HUANG Z,et al.Target-aware Dual Adversarial Learning and a Multi-scenarioMulti-Modality Benchmark to Fuse Infrared and Visible for Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE,2022:5792-5801.
[29]FLIR T.FREE Teledyne FLIR Thermal Dataset for Algorithm Training[EB/OL].https://www.flir.com/oem/adas/adas-dataset-form.
[30]JIA X,ZHU C,LI M,et al.LLVIP:A Visible-infrared Paired Dataset for Low-light Vision[C]//18th IEEE/CVF Interna-tional Conference on Computer Vision.New York:IEEE,2021:3489-3497.
[31]HWANG S,PARK J,KIM N,et al.Multispectral PedestrianDetection:Benchmark Dataset and Baseline[C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE,2015:1037-1045.
[32]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot-MultiBox Detector[C]//14th European Conference on Compu-ter Vision.German:Springer,2016:21-37.
[33]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//27th IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2014:580-587.
[34]HE M,WU Q,NGAN K N,et al.Misaligned RGB-Infrared Object Detection via Adaptive Dual-Discrepancy Calibration[J].Remote Sensing,2023,15(19):4887-4909.
[35]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2017:936-944.
[36]SANG H,ZHOU Q,ZHAO Y.PCANet:Pyramid convolutional attention network for semantic segmentation[J].Image and Vision Computing,2020,103:103997-104004.
[37]LI H,WU X J.DenseFuse:A Fusion Approach to Infrared and Visible Images[J].IEEE Transactions on Image Processing,2019,28(5):2614-2623.
[38]LI H,WU X J,DURRANI T.NestFuse:An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9645-9656.
[39]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tionalNetworks for Biomedical Image Segmentation[C]//MICCAI 2015.German:Springer,2015:234-241.
[40]LIU X,WANG Z,GAO H,et al.HATF:Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer[J].Remote Sensing,2024,16(5):803-824.
[41]LI H,WU X J,KITTLER J.RFN-Nest:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86.
[42]MA J,XU H,JIANG J,et al.DDcGAN:A Dual-DiscriminatorConditional Generative Adversarial Network for Multi-Resolution Image Fusion[J].IEEE Transactions on Image Processing,2020,29:4980-4995.
[43]MA J,YU W,LIANG P,et al.FusionGAN:A generative adversarial network for infrared and visible image fusion[J].Information Fusion,2019,48:11-26.
[44]TANG W,HE F,LIU Y,et al.DATFuse:Infrared and Visible Image Fusion via Dual Attention Transformer[J].IEEE Tran-sactions on Circuits and Systems for Video Technology,2023,33(7):3159-3172.
[45]ZHANG H,FROMONT E,LEFEVRE S,et al.Guided Attentive Feature Fusion for MultispectralPedestrian Detection[C]//IEEE Winter Conference on Applications of Computer Vision.New York:IEEE,2021:72-80.
[46]ZHANG L,LIU Z,ZHANG S,et al.Cross-modality interactive attention network for multispectral pedestrian detection[J].Information Fusion,2019,50:20-29.
[47]SOLOVYEV R,WANG W,GABRUSEVA T.Weighted boxesfusion:Ensembling boxes from different object detection models[J].Image and Vision Computing,2021,107:104117-104122.
[48]YANG L,ZHANG R Y,LI L,et al.SimAM:A Simple,Parameter-Free Attention Module for Convolutional Neural Networks[C]//International Conference on Machine Learning.New York:ICML,2021:11863-11874.
[49]ZHENG Z,WANG P,LIU W,et al.Distance-IoU Loss:Faster and Better Learning for Bounding Box Regression[C]//34th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2020:12993-13000.
[50]YU J,JIANG Y,WANG Z,et al.UnitBox:An Advanced Object Detection Network[C]//Association for Computing and Machinery.New York,ACM Press,2016:516-520.
[51]TANG L,YUAN J,ZHANG H,et al.PIAFusion:A progressive infrared and visible image fusion network based on illumination aware[J].Information Fusion,2022,83:79-92.
[52]FANG Y Q,HAN D P,WANG Z K.Cross-modality fusiontransformer for multispectral object detection[J].arXiv:2111.00273,2021.
[53]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization[C]//16th IEEE International Conference on Computer Vision.New York:IEEE,2017:618-626.
[1] LI Daicheng, LI Han, LIU Zheyu, GONG Shiheng. MacBERT Based Chinese Named Entity Recognition Fusion with Dependent Syntactic Information and Multi-view Lexical Information [J]. Computer Science, 2025, 52(6A): 240600121-8.
[2] HUANG Bocheng, WANG Xiaolong, AN Guocheng, ZHANG Tao. Transmission Line Fault Identification Method Based on Transfer Learning and Improved YOLOv8s [J]. Computer Science, 2025, 52(6A): 240800044-8.
[3] WU Zhihua, CHENG Jianghua, LIU Tong, CAI Yahui, CHENG Bang, PAN Lehao. Human Target Detection Algorithm for Low-quality Laser Through-window Imaging [J]. Computer Science, 2025, 52(6A): 240600069-6.
[4] ZENG Fanyun, LIAN Hechun, FENG Shanshan, WANG Qingmei. Material SEM Image Retrieval Method Based on Multi-scale Features and Enhanced HybridAttention Mechanism [J]. Computer Science, 2025, 52(6A): 240800014-7.
[5] HOU Zhexiao, LI Bicheng, CAI Bingyan, XU Yifei. High Quality Image Generation Method Based on Improved Diffusion Model [J]. Computer Science, 2025, 52(6A): 240500094-9.
[6] DING Xuxing, ZHOU Xueding, QIAN Qiang, REN Yueyue, FENG Youhong. High-precision and Real-time Detection Algorithm for Photovoltaic Glass Edge Defects Based onFeature Reuse and Cheap Operation [J]. Computer Science, 2025, 52(6A): 240400146-10.
[7] HUANG Hong, SU Han, MIN Peng. Small Target Detection Algorithm in UAV Images Integrating Multi-scale Features [J]. Computer Science, 2025, 52(6A): 240700097-5.
[8] WANG Rong , ZOU Shuping, HAO Pengfei, GUO Jiawei, SHU Peng. Sand Dust Image Enhancement Method Based on Multi-cascaded Attention Interaction [J]. Computer Science, 2025, 52(6A): 240800048-7.
[9] LE Lingzhi, ZHAI Jiangtao, YU Ming, SUN Tongqing. Object Detection-based Method for Guiding Passenger Flow in Boarding and Deparking Areas ofRail Transit [J]. Computer Science, 2025, 52(6A): 240400192-9.
[10] WANG Baohui, GAO Zhan, XU Lin, TAN Yingjie. Research and Implementation of Mine Gas Concentration Prediction Algorithm Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240400188-7.
[11] ZHENG Chuangrui, DENG Xiuqin, CHEN Lei. Traffic Prediction Model Based on Decoupled Adaptive Dynamic Graph Convolution [J]. Computer Science, 2025, 52(6A): 240400149-8.
[12] HONG Yi, SHEN Shikai, SHE Yumei, YANG Bin, DAI Fei, WANG Jianxiao, ZHANG Liyi. Multivariate Time Series Prediction Based on Dynamic Graph Learning and Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240700047-8.
[13] TENG Minjun, SUN Tengzhong, LI Yanchen, CHEN Yuan, SONG Mofei. Internet Application User Profiling Analysis Based on Selection State Space Graph Neural Network [J]. Computer Science, 2025, 52(6A): 240900060-8.
[14] ZHAO Chanchan, YANG Xingchen, SHI Bao, LYU Fei, LIU Libin. Optimization Strategy of Task Offloading Based on Meta Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240800050-8.
[15] GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!