Computer Science ›› 2026, Vol. 53 ›› Issue (6): 263-269.doi: 10.11896/jsjkx.250700103

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Object Detection Method Based on Dynamic Feature Fusion

LIU Jikang1, HUANG Lei1, ZHANG Ke1, NIE Jie1, WEI Zhiqiang1,2   

  1. 1 Faculty of Information Science and Engineering,Ocean University of China,Qingdao,Shandong 266000,China
    2 College of Computer Science & Technology,Qingdao University,Qingdao,Shandong 266001,China
  • Received:2025-07-16 Revised:2025-10-20 Online:2026-06-15 Published:2026-06-09
  • About author:LIU Jikang,born in 1998,postgraduate,is a member of CCF(No.A01785G).His main research interests include computer vision and object detection.
    HUANG Lei,born in 1983,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.17587M).His main research interests include multimedia content analysis and retrieval,computer vision,machine learning, and marine big data analysis.
  • Supported by:
    National Natural Science Foundation of China(62472390) and Shandong Provincial Natural Science Foundation(ZR2023MF033).

Abstract: Object detection is a foundational task in computer vision,with wide-ranging applications in autonomous driving,intelligent transportation,and medical diagnosis.DETR(DEtection TRansformer) pioneers an end-to-end Transformer-based detection framework that eliminates hand-crafted components.However,its decoder's strong inter-layer dependencies give rise to cascading negative optimization during decoding,impeding both training efficiency and accuracy.To address this limitation,this paper proposes DFF DETR(Dynamic Feature Fusion DETR),which dynamically fuses cross-layer features to enable effective propagation of object queries across decoder layers and reduce overreliance on preceding outputs.Additionally,it introduces an inter-layer supervisory signal during backpropagation to refine intermediate query representations and correct suboptimal early outputs.Extensive experiments on various DETR-based detectors demonstrate that integrating DFF DETR yields an average mAP improvement of approximately 1.0%,with particularly pronounced gains in small-object detection.

Key words: Object detection, Feature fusion, Supervision signal, DEtection TRansformer, Object query

CLC Number: 

  • TP391.4
[1]LI B,YAN J,WU W,et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2018:8971-8980.
[2]MENG L,YANG X.A Survey of Object Tracking Algorithms [J].Acta Automatica Sinica,2019,45(7):1244-1260.
[3]HE K,GKIOXARI G,DOLLÁR P,et al.Mask r-cnn[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2017:2961-2969.
[4]WANG Z Y,YUAN C,LI J C.Instance Segmentation with Se-parable Convolutions and Multi-level Features [J].Journal of Software,2019,30(4):954-961.
[5]XIAO T,LIU Y,ZHOU B,et al.Unified perceptual parsing for scene understanding[C]//Proceedings of the European Confe-rence on Computer Vision.Berlin:Springer,2018:418-434.
[6]WU D H,YE X Q,GU W K.An Uncertain Knowledge Based Real Time Road Scene Understanding Algorithm [J].Journal of Image and Graphics,2002(1):71-76.
[7]TANG W B,LI F.Semi-supervised object detection algorithm based on feature alignment and feature fusion [J].Journal of Chongqing Technology and Business University(Natural Science Edition),2025,42(1):35-41.
[8]LIU S,HUANG D,WANG Y.Adaptive nms:Refining pedestrian detection in a crowd[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Pisca-taway,NJ:IEEE,2019:6459-6468.
[9]SUN Z,BEBIS G,MILLER R.On-road vehicle detection:A review [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(5):694-711.
[10]HUANG K Q,CHEN X T,KANG Y F,et al.Intelligent Visual Surveillance:A Review [J].Chinese Journal of Computers,2015,38(6):1093-1118.
[11]VASWANI A.Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010.
[12]TIAN Y L,WANG Y T,WANG J G,et al.Key Problems and Progress of Vision Transformers:The State of the Art and Prospects [J].Acta Automatica Sinica,2022,48(4):957-979.
[13]CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers [C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2020:213-229.
[14]CHEN F,ZHANG H,HU K,et al.Enhanced training of query-based object detection via selective query recollection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:23756-23765.
[15]LIU S,LI F,ZHANG H,et al.DAB-DETR:Dynamic anchorboxes are better queries for DETR[C]//Proceedings of the International Conference on Learning Representations.2022.
[16]ZHU X,SU W,LU L,et al.Deformable detr:Deformable transformers for end-to-end object detection[C]//Proceedings of the International Conference on Learning Representations.2021.
[17]LAW H,DENG J.Cornernet:Detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2018:734-750.
[18]MENG D,CHEN X,FAN Z,et al.Conditional detr for fasttraining convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:3651-3660.
[19]WANG Y,ZHANG X,YANG T,et al.Anchor detr:Query design for transformer-based detector[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park,CA:AAAI,2022:2567-2575.
[20]LI F,ZHANG H,LIU S,et al.Dn-detr:Accelerate detr training by introducing query denoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2022:13619-13627.
[21]ZHANG H,LI F,LIU S,et al.DINO:DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection[C]//Proceedings of the International Conference on Learning Representations.2023.
[22]CHEN Q,CHEN X,WANG J,et al.Group detr:Fast detr trai-ning with group-wise one-to-many assignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2023:6633-6642.
[23]JIA D,YUAN Y,HE H,et al.Detrs with hybrid matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:19702-19712.
[24]HOU X,LIU M,ZHANG S,et al.Salience detr:Enhancing detection transformer with hierarchical salience filtering refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2024:17574-17583.
[25]HUANG Y X,LIU H I,SHUAI H H,et al.Dq-detr:Detr with dynamic query for tiny object detection[C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2024:290-305.
[26]HOU X,LIU M,ZHANG S,et al.Relation detr:Exploring explicit position relation prior for object detection[C]//Procee-dings of the European Conference on Computer Vision.Berlin:Springer,2024:89-105.
[27]TEED Z,DENG J.Raft:Recurrent all-pairs field transforms for optical flow[C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2020:402-419.
[28]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in contex[C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2014:740-755.
[29]WANG C Y,YEH I H,MARK L H Y.Yolov9:Learning what you want to learn using programmable gradient information[C]//Proceedings of the European Conference on Computer Vision.Berlin:Springer,2024:1-21.
[30]ZHAO C,SUN Y,WANG W,et al.MS-DETR:Efficient DETR training with mixed supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2024:17027-17036.
[31]ZHAO Y,LYU W,XU S,et al.Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2024:16965-16974.
[32]HOU X,LIU M,ZHANG S,et al.Salience detr:Enhancing detection transformer with hierarchical salience filtering refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2024:17574-17583.
[33]HUANG S,LU Z,CUN X,et al.Deim:Detr with improved matching for fast convergence[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2025:15162-15171.
[1] JI Wenyu, LI Yang, WANG Jiabao, FU Ruizhi, LIU Xiaoyu, MIAO Zhuang. Review of 3D Object Detection Based on LiDAR-camera Fusion [J]. Computer Science, 2026, 53(6): 214-231.
[2] LI Peng, ZHANG Zihao, HAN Yahong. Primitive Dynamic Weighting for Multi-modal Salient Object Detection [J]. Computer Science, 2026, 53(6): 242-251.
[3] WU Man, WANG Gaocai, LU Yuting, WEN Lili. Power Object Detection Based on Spatial Interaction and Split Attention in Few-shots [J]. Computer Science, 2026, 53(6): 252-262.
[4] SONG Jianhua, LIU Chun, ZHANG Yan. Lightweight Camouflaged Object Detection Model Based on Structured Knowledge Distillation [J]. Computer Science, 2026, 53(4): 299-307.
[5] ZHAO Binbei, ZHU Li, ZHAO Hongli, LI Yutong. Computer Vision Applications in Rail Transit Systems [J]. Computer Science, 2026, 53(3): 214-224.
[6] SONG Jianhua, HE Jiawei, ZHANG Yan. Dual-channel Source Code Vulnerability Detection Model Based on Contrastive Learning [J]. Computer Science, 2026, 53(3): 424-432.
[7] HUANG Jing, WANG Teng, LIU Jian, HU Kai, PENG Xin, HUANG Yamin, WEN Yuanqiao. Multimodal Visual Detection for Underwater Sonar Target Images [J]. Computer Science, 2026, 53(2): 227-235.
[8] LIU Chenhong, LI Fenglian, YANG Jia, WANG Suzhe, CHEN Guijun. Boundary-focused Multi-scale Feature Fusion Network for Stroke Lesion Segmentation [J]. Computer Science, 2026, 53(2): 264-272.
[9] ZHOU Bingquan, JIANG Jie, CHEN Jiangmin, ZHAN Lixin. EvR-DETR:Event-RGB Fusion for Lightweight End-to-End Object Detection [J]. Computer Science, 2026, 53(1): 153-162.
[10] LI Fangfang, KONG Yuqiu, LIU Yang , LI Pengyue. Co-salient Object Detection Guided by Category Labels [J]. Computer Science, 2026, 53(1): 163-172.
[11] LI Ang, ZHANG Jieyuan, LIU Xunyun. Camouflaged Object Detection for Aerial Images Based on Bidirectional Cross-attentionCross-domain Fusion [J]. Computer Science, 2026, 53(1): 173-179.
[12] FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
[13] DUAN Pengting, WEN Chao, WANG Baoping, WANG Zhenni. Collaborative Semantics Fusion for Multi-agent Behavior Decision-making [J]. Computer Science, 2026, 53(1): 252-261.
[14] ZHANG Xiaomin, ZHAO Junzhi, HE Hongjie. Screen-shooting Resilient Watermarking Method for Document Image Based on Attention Mechanism [J]. Computer Science, 2026, 53(1): 413-422.
[15] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!