计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 173-182.doi: 10.11896/jsjkx.240300068
程清华1,2, 鉴海防1, 郑帅康1, 郭慧敏1,2, 李越豪1,2
CHENG Qinghua1,2, JIAN Haifang1, ZHENG Shuaikang1, GUO Huimin1,2, LI Yuehao1,2
摘要: 基于红外/可见光融合的方法能够有效改善道路交通、安防监控等开放场景下的目标检测的效果。现有方法较少针对红外/可见光差异性设计特征交互机制,使得检测的精度和鲁棒性受限。因此,设计了一种基于双流结构的红外/可见光图像融合网络,充分考虑了不同模态图像间的差异,通过提取和融合不同模态图像的多层次特征信息,实现开放环境下目标的精准识别。针对可见光图像质量容易受到环境光照变化影响的问题,设计了轻量化的图像光照感知模块,通过权重分配函数动态调整红外/可见光的融合权重,提高了融合算法的适应性和准确性。同时,设计了无参数的3D注意力模块,以自动识别网络所提取特征的通道和空间重要性,并根据模态间的重要性不同分配不同的融合权重,其能够在不增加网络参数量的前提下提高网络融合的效果。此外,构建了一套近红外/可见光数据集(NRS),为开放场景的目标识别任务提供了更多源的数据。最后,在自主构建的数据集NRS和公开数据集M3FD上对模型进行了一系列测试。结果表明,所提方法检测精度分别达到93.5%,92.2%(mAP.50),能够为开放场景中的目标精准探测识别提供支撑。
中图分类号:
[1]ZHAO T,NING X,HONG K,et al.Ada3D:Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection[C]//IEEE/CVF International Conference on Computer Vision.New York:IEEE,2023:17682-17692. [2]LIX X,QIANG J,LIU W J,et al.Research on Traffic Object Detection Method in Fog Based on Dual Backbone Network[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2023,40(4):25-34. [3]WANG C Y,BOCHKOVSKIY A,LIAO H Y M,et al.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time objectdetectors[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE,2023:7464-7475. [4]DUAN K,BAI S,XIE L,et al.CenterNet:Keypoint Triplets for Object Detection[C]//IEEE/CVF International Conference on Computer Vision.New York:IEEE,2019:6568-6577. [5]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:779-788. [6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//29th Annual Conference on Neural Information Proces-sing Systems.San Diego:NIPS,2015:1137-1149. [7]CAO J,PANG Y,XIE J,et al.From Handcrafted to Deep Features for Pedestrian Detection:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(9):4913-4934. [8]ZHOU L,CHEN Z.Illumination-aware window transformer for RGBT modality fusion[J].Journal of Visual Communication and Image Representation,2023,90(2):1-10. [9]CAO Y,LUO X,YANG J,et al.Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection[J].Information Fusion,2022,88:1-11. [10]SONG X,GAO S,CHEN C.A multispectral feature fusion network for robust pedestrian detection[J].Alexandria Engineering Journal,2021,60(1):73-85. [11]ZHOU K L,CHEN L S,XUN C.Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems[C]//ECCV 2020.German:Springer,2020:787-803. [12]CUI Z Y,ZHOU J H,PENG Y Y.Survey on Cross-modalityObject Re-identification Research[J].Computer Science,2024,51(1):13-25. [13]LIU J J,ZHANG S T,WANG S,et al.Multispectral Deep Neural Networks for Pedestrian Detection[C]//Proceedings of the British Machine Vision Conference 2016.UK:BMVA Press,2016:1-13. [14]LIU T,LAM K M,ZHAO R,et al.Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(1):315-329. [15]MA J,TANG L,XU M,et al.STDFusionNet:An Infrared and Visible Image Fusion Network Based on Salient Target Detection[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-13. [16]GOLCARENARENJI G,MARTINEZ-ALPISTE I,WANG Q,et al.Illumination-aware image fusion for around-the-clock human detection in adverse environments from Unmanned Aerial Vehicle[J].Expert Systems with Applications,2022,204(15):117413-117425. [17]LI C,SONG D,TONG R,et al.Illumination-aware fasterR-CNN for robust multispectral pedestrian detection[J].Pattern Recognition,2019,85:161-171. [18]GUAN D,CAO Y,YANG J,et al.Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection[J].Information Fusion,2019,50:148-157. [19]WANG J H,ZHONG X,LI W X,et al.Human Activity Recognition with Meta-learning and Attention[J].Computer Science,2023,50(8):193-201. [20]HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2023. [21]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//15th European Conference on Computer Vision.German:Springer,2018:3-19. [22]JIN S,YU B,JING M,et al.DarkVisionNet:Low-Light Imaging via RGB-NIR Fusion with Deep Inconsistency Prior[C]//36th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2022:1104-1112. [23]HE R,WU X,SUN Z,et al.Learning Invariant Deep Representation for NIR-VIS Face Recognition[C]//31st AAAI Confe-rence on Artificial Intelligence.Menlo Park:AAAI,2017:2000-2006. [24]REALE C,NASRABADI N M,KWON H,et al.Seeing the Fo-rest from the Trees:A Holistic Approach to Near-infrared He-terogeneous Face Recognition[C]//29th IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:320-328. [25]MAERK J,KARNER M,ANDRE M,et al.Online Process Control of a Pharmaceutical Intermediate in a Fluidized-Bed Drier Environment Using Near-Infrared Spectroscopy[J].Analytical Chemistry,2010,82(10):4209-4215. [26]TOET A.TNO Image FusionDataset(Version2.0)[EB/OL].https://doi.org/10.6084/m9.figshare.1008029.v2. [27]DAVIS J W,KECK M A.A two-stage template approach toperson detection in thermal imagery[C]//7th IEEE Workshop on Applications of Computer Vision.New York:IEEE,2005:364-369. [28]LIU J,FAN X,HUANG Z,et al.Target-aware Dual Adversarial Learning and a Multi-scenarioMulti-Modality Benchmark to Fuse Infrared and Visible for Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE,2022:5792-5801. [29]FLIR T.FREE Teledyne FLIR Thermal Dataset for Algorithm Training[EB/OL].https://www.flir.com/oem/adas/adas-dataset-form. [30]JIA X,ZHU C,LI M,et al.LLVIP:A Visible-infrared Paired Dataset for Low-light Vision[C]//18th IEEE/CVF Interna-tional Conference on Computer Vision.New York:IEEE,2021:3489-3497. [31]HWANG S,PARK J,KIM N,et al.Multispectral PedestrianDetection:Benchmark Dataset and Baseline[C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE,2015:1037-1045. [32]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot-MultiBox Detector[C]//14th European Conference on Compu-ter Vision.German:Springer,2016:21-37. [33]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//27th IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2014:580-587. [34]HE M,WU Q,NGAN K N,et al.Misaligned RGB-Infrared Object Detection via Adaptive Dual-Discrepancy Calibration[J].Remote Sensing,2023,15(19):4887-4909. [35]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2017:936-944. [36]SANG H,ZHOU Q,ZHAO Y.PCANet:Pyramid convolutional attention network for semantic segmentation[J].Image and Vision Computing,2020,103:103997-104004. [37]LI H,WU X J.DenseFuse:A Fusion Approach to Infrared and Visible Images[J].IEEE Transactions on Image Processing,2019,28(5):2614-2623. [38]LI H,WU X J,DURRANI T.NestFuse:An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9645-9656. [39]RONNEBERGER O,FISCHER P,BROX T.U-Net:Convolu-tionalNetworks for Biomedical Image Segmentation[C]//MICCAI 2015.German:Springer,2015:234-241. [40]LIU X,WANG Z,GAO H,et al.HATF:Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer[J].Remote Sensing,2024,16(5):803-824. [41]LI H,WU X J,KITTLER J.RFN-Nest:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86. [42]MA J,XU H,JIANG J,et al.DDcGAN:A Dual-DiscriminatorConditional Generative Adversarial Network for Multi-Resolution Image Fusion[J].IEEE Transactions on Image Processing,2020,29:4980-4995. [43]MA J,YU W,LIANG P,et al.FusionGAN:A generative adversarial network for infrared and visible image fusion[J].Information Fusion,2019,48:11-26. [44]TANG W,HE F,LIU Y,et al.DATFuse:Infrared and Visible Image Fusion via Dual Attention Transformer[J].IEEE Tran-sactions on Circuits and Systems for Video Technology,2023,33(7):3159-3172. [45]ZHANG H,FROMONT E,LEFEVRE S,et al.Guided Attentive Feature Fusion for MultispectralPedestrian Detection[C]//IEEE Winter Conference on Applications of Computer Vision.New York:IEEE,2021:72-80. [46]ZHANG L,LIU Z,ZHANG S,et al.Cross-modality interactive attention network for multispectral pedestrian detection[J].Information Fusion,2019,50:20-29. [47]SOLOVYEV R,WANG W,GABRUSEVA T.Weighted boxesfusion:Ensembling boxes from different object detection models[J].Image and Vision Computing,2021,107:104117-104122. [48]YANG L,ZHANG R Y,LI L,et al.SimAM:A Simple,Parameter-Free Attention Module for Convolutional Neural Networks[C]//International Conference on Machine Learning.New York:ICML,2021:11863-11874. [49]ZHENG Z,WANG P,LIU W,et al.Distance-IoU Loss:Faster and Better Learning for Bounding Box Regression[C]//34th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2020:12993-13000. [50]YU J,JIANG Y,WANG Z,et al.UnitBox:An Advanced Object Detection Network[C]//Association for Computing and Machinery.New York,ACM Press,2016:516-520. [51]TANG L,YUAN J,ZHANG H,et al.PIAFusion:A progressive infrared and visible image fusion network based on illumination aware[J].Information Fusion,2022,83:79-92. [52]FANG Y Q,HAN D P,WANG Z K.Cross-modality fusiontransformer for multispectral object detection[J].arXiv:2111.00273,2021. [53]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization[C]//16th IEEE International Conference on Computer Vision.New York:IEEE,2017:618-626. |
|