计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 173-182.doi: 10.11896/jsjkx.240300068

• 计算机图形学&多媒体 • 上一篇    下一篇

基于光照感知的红外/可见光融合目标检测

程清华1,2, 鉴海防1, 郑帅康1, 郭慧敏1,2, 李越豪1,2   

  1. 1 中国科学院半导体研究所固态光电信息技术实验室 北京 100083
    2 中国科学院大学材料科学与光电技术学院 北京 100049
  • 收稿日期:2024-03-11 修回日期:2024-07-18 出版日期:2025-02-15 发布日期:2025-02-17
  • 通讯作者: 鉴海防(jhf@semi.ac.cn)
  • 作者简介:(chengqinghua@semi.ac.cn)
  • 基金资助:
    中国科学院战略性先导科技专项(XDB0460000)

Illumination-aware Infrared/Visible Fusion for Object Detection

CHENG Qinghua1,2, JIAN Haifang1, ZHENG Shuaikang1, GUO Huimin1,2, LI Yuehao1,2   

  1. 1 Laboratory of Solid State Optoelectronics Information Technology,Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083, China
    2 College of Materials Science and Opto-Electronic Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2024-03-11 Revised:2024-07-18 Online:2025-02-15 Published:2025-02-17
  • About author:CHENG Qinghua,born in 1999,postgraduate.His main research interests include multi-spectral image processing algorithms and so on.
    JIAN Haifang,born in 1978,Ph.D,researcher,is a member of CCF(No.O2087M).His main research interests include intelligent information proces-sing algorithms and systems,and so on.
  • Supported by:
    Progress of Strategy Priority Research Program of Chinese Academy of Sciences(XDB0460000).

摘要: 基于红外/可见光融合的方法能够有效改善道路交通、安防监控等开放场景下的目标检测的效果。现有方法较少针对红外/可见光差异性设计特征交互机制,使得检测的精度和鲁棒性受限。因此,设计了一种基于双流结构的红外/可见光图像融合网络,充分考虑了不同模态图像间的差异,通过提取和融合不同模态图像的多层次特征信息,实现开放环境下目标的精准识别。针对可见光图像质量容易受到环境光照变化影响的问题,设计了轻量化的图像光照感知模块,通过权重分配函数动态调整红外/可见光的融合权重,提高了融合算法的适应性和准确性。同时,设计了无参数的3D注意力模块,以自动识别网络所提取特征的通道和空间重要性,并根据模态间的重要性不同分配不同的融合权重,其能够在不增加网络参数量的前提下提高网络融合的效果。此外,构建了一套近红外/可见光数据集(NRS),为开放场景的目标识别任务提供了更多源的数据。最后,在自主构建的数据集NRS和公开数据集M3FD上对模型进行了一系列测试。结果表明,所提方法检测精度分别达到93.5%,92.2%(mAP.50),能够为开放场景中的目标精准探测识别提供支撑。

关键词: 目标检测, 多谱段融合, 近红外图像, 光照感知, 注意力机制

Abstract: The method based on infrared/visible light fusion can effectively improve the effect of target detection in open scena-rios such as road traffic and security monitoring.The existing methods rarely design feature interaction mechanisms for infrared/visible light differences,which limits the accuracy and robustness of detection.Therefore,this paper designs an infrared/visible image fusion network based on dual-stream structure,which fully considers the differences between different modal images,and realizes accurate target recognition in the open environment by extracting and fusing the multi-level feature information of diffe-rent modal images.In order to solve the problem that the quality of visible image is easily affected by the change of ambient illumination,a lightweight illumination-aware module is designed,and the weight of infrared/visible fusion is dynamically adjusted through the weight allocation function,so as to improve the adaptability and accuracy of the fusion algorithm.At the same time,a parameter-free 3D attention module is designed to automatically identify the channel and spatial importance of the extracted features of the network,and different fusion weights are assigned according to the importance of different modes,which can improve the effect of network fusion without increasing the number of parameters of the network.In addition,this paper constructs a set of near-infrared/visible light datasets(NRS),which provides more source data for target recognition tasks in open scenes.Finally,a series of tests are carried out on the self-constructed dataset NRS and the public dataset M3FD,and the results show that the detection accuracy of the proposed method reaches 93.5% and 92.2%(mAP.50) respectively,which can provide support for accurate target detection and recognition in open scenes.

Key words: Object detection, Multi-spectral image fusion, Near-infrared image, Illumination-aware, Attention mechanism

中图分类号: 

  • TP391
[1]ZHAO T,NING X,HONG K,et al.Ada3D:Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection[C]//IEEE/CVF International Conference on Computer Vision.New York:IEEE,2023:17682-17692.
[2]LIX X,QIANG J,LIU W J,et al.Research on Traffic Object Detection Method in Fog Based on Dual Backbone Network[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2023,40(4):25-34.
[3]WANG C Y,BOCHKOVSKIY A,LIAO H Y M,et al.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time objectdetectors[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE,2023:7464-7475.
[4]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint Triplets for Object Detection[C]//IEEE/CVF International Conference on Computer Vision.New York:IEEE,2019:6568-6577.
[5]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:779-788.
[6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//29th Annual Conference on Neural Information Proces-sing Systems.San Diego:NIPS,2015:1137-1149.
[7]CAO J,PANG Y,XIE J,et al.From Handcrafted to Deep Features for Pedestrian Detection:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(9):4913-4934.
[8]ZHOU L,CHEN Z.Illumination-aware window transformer for RGBT modality fusion[J].Journal of Visual Communication and Image Representation,2023,90(2):1-10.
[9]CAO Y,LUO X,YANG J,et al.Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection[J].Information Fusion,2022,88:1-11.
[10]SONG X,GAO S,CHEN C.A multispectral feature fusion network for robust pedestrian detection[J].Alexandria Engineering Journal,2021,60(1):73-85.
[11]ZHOU K L,CHEN L S,XUN C.Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems[C]//ECCV 2020.German:Springer,2020:787-803.
[12]CUI Z Y,ZHOU J H,PENG Y Y.Survey on Cross-modalityObject Re-identification Research[J].Computer Science,2024,51(1):13-25.
[13]LIU J J,ZHANG S T,WANG S,et al.Multispectral Deep Neural Networks for Pedestrian Detection[C]//Proceedings of the British Machine Vision Conference 2016.UK:BMVA Press,2016:1-13.
[14]LIU T,LAM K M,ZHAO R,et al.Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(1):315-329.
[15]MA J,TANG L,XU M,et al.STDFusionNet:An Infrared and Visible Image Fusion Network Based on Salient Target Detection[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-13.
[16]GOLCARENARENJI G,MARTINEZ-ALPISTE I,WANG Q,et al.Illumination-aware image fusion for around-the-clock human detection in adverse environments from Unmanned Aerial Vehicle[J].Expert Systems with Applications,2022,204(15):117413-117425.
[17]LI C,SONG D,TONG R,et al.Illumination-aware fasterR-CNN for robust multispectral pedestrian detection[J].Pattern Recognition,2019,85:161-171.
[18]GUAN D,CAO Y,YANG J,et al.Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection[J].Information Fusion,2019,50:148-157.
[19]WANG J H,ZHONG X,LI W X,et al.Human Activity Recognition with Meta-learning and Attention[J].Computer Science,2023,50(8):193-201.
[20]HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2023.
[21]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//15th European Conference on Computer Vision.German:Springer,2018:3-19.
[22]JIN S,YU B,JING M,et al.DarkVisionNet:Low-Light Imaging via RGB-NIR Fusion with Deep Inconsistency Prior[C]//36th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2022:1104-1112.
[23]HE R,WU X,SUN Z,et al.Learning Invariant Deep Representation for NIR-VIS Face Recognition[C]//31st AAAI Confe-rence on Artificial Intelligence.Menlo Park:AAAI,2017:2000-2006.
[24]REALE C,NASRABADI N M,KWON H,et al.Seeing the Fo-rest from the Trees:A Holistic Approach to Near-infrared He-terogeneous Face Recognition[C]//29th IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:320-328.
[25]MAERK J,KARNER M,ANDRE M,et al.Online Process Control of a Pharmaceutical Intermediate in a Fluidized-Bed Drier Environment Using Near-Infrared Spectroscopy[J].Analytical Chemistry,2010,82(10):4209-4215.
[26]TOET A.TNO Image FusionDataset(Version2.0)[EB/OL].https://doi.org/10.6084/m9.figshare.1008029.v2.
[27]DAVIS J W,KECK M A.A two-stage template approach toperson detection in thermal imagery[C]//7th IEEE Workshop on Applications of Computer Vision.New York:IEEE,2005:364-369.
[28]LIU J,FAN X,HUANG Z,et al.Target-aware Dual Adversarial Learning and a Multi-scenarioMulti-Modality Benchmark to Fuse Infrared and Visible for Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE,2022:5792-5801.
[29]FLIR T.FREE Teledyne FLIR Thermal Dataset for Algorithm Training[EB/OL].https://www.flir.com/oem/adas/adas-dataset-form.
[30]JIA X,ZHU C,LI M,et al.LLVIP:A Visible-infrared Paired Dataset for Low-light Vision[C]//18th IEEE/CVF Interna-tional Conference on Computer Vision.New York:IEEE,2021:3489-3497.
[31]HWANG S,PARK J,KIM N,et al.Multispectral PedestrianDetection:Benchmark Dataset and Baseline[C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE,2015:1037-1045.
[32]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot-MultiBox Detector[C]//14th European Conference on Compu-ter Vision.German:Springer,2016:21-37.
[33]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//27th IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2014:580-587.
[34]HE M,WU Q,NGAN K N,et al.Misaligned RGB-Infrared Object Detection via Adaptive Dual-Discrepancy Calibration[J].Remote Sensing,2023,15(19):4887-4909.
[35]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2017:936-944.
[36]SANG H,ZHOU Q,ZHAO Y.PCANet:Pyramid convolutional attention network for semantic segmentation[J].Image and Vision Computing,2020,103:103997-104004.
[37]LI H,WU X J.DenseFuse:A Fusion Approach to Infrared and Visible Images[J].IEEE Transactions on Image Processing,2019,28(5):2614-2623.
[38]LI H,WU X J,DURRANI T.NestFuse:An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9645-9656.
[39]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tionalNetworks for Biomedical Image Segmentation[C]//MICCAI 2015.German:Springer,2015:234-241.
[40]LIU X,WANG Z,GAO H,et al.HATF:Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer[J].Remote Sensing,2024,16(5):803-824.
[41]LI H,WU X J,KITTLER J.RFN-Nest:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86.
[42]MA J,XU H,JIANG J,et al.DDcGAN:A Dual-DiscriminatorConditional Generative Adversarial Network for Multi-Resolution Image Fusion[J].IEEE Transactions on Image Processing,2020,29:4980-4995.
[43]MA J,YU W,LIANG P,et al.FusionGAN:A generative adversarial network for infrared and visible image fusion[J].Information Fusion,2019,48:11-26.
[44]TANG W,HE F,LIU Y,et al.DATFuse:Infrared and Visible Image Fusion via Dual Attention Transformer[J].IEEE Tran-sactions on Circuits and Systems for Video Technology,2023,33(7):3159-3172.
[45]ZHANG H,FROMONT E,LEFEVRE S,et al.Guided Attentive Feature Fusion for MultispectralPedestrian Detection[C]//IEEE Winter Conference on Applications of Computer Vision.New York:IEEE,2021:72-80.
[46]ZHANG L,LIU Z,ZHANG S,et al.Cross-modality interactive attention network for multispectral pedestrian detection[J].Information Fusion,2019,50:20-29.
[47]SOLOVYEV R,WANG W,GABRUSEVA T.Weighted boxesfusion:Ensembling boxes from different object detection models[J].Image and Vision Computing,2021,107:104117-104122.
[48]YANG L,ZHANG R Y,LI L,et al.SimAM:A Simple,Parameter-Free Attention Module for Convolutional Neural Networks[C]//International Conference on Machine Learning.New York:ICML,2021:11863-11874.
[49]ZHENG Z,WANG P,LIU W,et al.Distance-IoU Loss:Faster and Better Learning for Bounding Box Regression[C]//34th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2020:12993-13000.
[50]YU J,JIANG Y,WANG Z,et al.UnitBox:An Advanced Object Detection Network[C]//Association for Computing and Machinery.New York,ACM Press,2016:516-520.
[51]TANG L,YUAN J,ZHANG H,et al.PIAFusion:A progressive infrared and visible image fusion network based on illumination aware[J].Information Fusion,2022,83:79-92.
[52]FANG Y Q,HAN D P,WANG Z K.Cross-modality fusiontransformer for multispectral object detection[J].arXiv:2111.00273,2021.
[53]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization[C]//16th IEEE International Conference on Computer Vision.New York:IEEE,2017:618-626.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!