Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250700088-7.doi: 10.11896/jsjkx.250700088

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Object Detection Method Based on Phased Training Strategy and Multi-scale Feature Fusion

QU Jiewu1, LU Xinxi2, SUN Jian1, LIU Yan1, GAO Ling1, XU Binbin1   

  1. 1 Pipechina Digital Co.,Ltd.,Beijing 102200,China
    2 School of Software,Beihang University,Beijing 100191,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:QU Jiewu,born in 1985.His main research interests include network communication and artificial intelligence.
    LU Xinxi,born in 1978,Ph.D,associate professor,master's supervisor.His main research interests include intelligent software and artificial intelligence.

Abstract: To overcome these limitations,such as the computational bottlenecks and the challenge of balancing real-time perfor-mance with accuracy in the DETR(Detection Transformer) family of object detection methods during inference,this paper proposes an enhanced approach that combines a phased training strategy with multi-scale feature fusion.Specifically,the multi-layer encoder structure of DETR is simplified to reduce computational complexity,while the phased training strategy improves feature representation and accelerates model convergence.In the first phase,one-to-many label matching is adopted to obtain high-quality two-dimensional multi-scale features.In the second phase,the weights from the first phase are frozen,and a parallel attention-convolutional fusion module is introduced to further refine the features.Experimental results demonstrate that the proposed method achieves a 5× increase in inference speed and a 1.5-point AP gain over the baseline model on the COCO dataset,effectively alleviating DETR's inference inefficiency.In addition,it yields a 1.4-point AP improvement on the BitVehicle dataset.

Key words: Object detection, Parallel attention convolution, Phased training strategy, Multi-scale feature fusion

CLC Number: 

  • TP311
[1] CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229.
[2] ZHU X,SU W,LU L,et al.Deformable detr:Deformable transformers for end-to-end object detection[J].arXiv:2010.04159,2020.
[3] LIU S,LI F,ZHANG H,et al.Dab-detr:Dynamic anchor boxes are better queries for detr[J].arXiv:2201.12329,2022.
[4] ZHANG H,LI F,LIU S,et al.Dino:Detr with improved denoi-sing anchor boxes for end-to-end object detection[J].arXiv:2203.03605,2022.
[5] LI F,ZENG A,LIU S,et al.Lite detr:An interleaved multi-scale encoder for efficient detr[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:18558-18567.
[6] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[7] HE K,GKIOXARI G,DOLLÁR P,et al.Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969.
[8] CAI Z,VASCONCELOS N.Cascade r-cnn:Delving into highquality object detection[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:6154-6162.
[9] WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2023:7464-7475.
[10] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[11] WU J,ZHAO C.Small Object Detection Method Based on Improved DETR Algorithm[J/OL].Computer Applications,2025.
[12] MENG D,CHEN X,FAN Z,et al.Conditional detr for fasttraining convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:3651-3660.
[13] YAO Z,AI J,LI B,et al.Efficient detr:improving end-to-endobject detector with dense prior[J].arXiv:2104.01318,2021.
[14] WANG Y,ZHANG X,YANG T,et al.Anchor detr:Query design for transformer-based detector[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:2567-2575.
[15] ZHANG D P,WEI Y Y,HE S J,et al.Feature Fusion and Inter-layer Transmission:An Improved Object Detection Method Based on Anchor DETR[J].Journal of Graphics,2024,45(5):968-978.
[16] LI F,ZHANG H,LIU S,et al.Dn-detr:Accelerate detr training by introducing query denoising[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13619-13627.
[17] ZHENG D,DONG W,HU H,et al.Less is more:Focus attention for efficient detr[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:6674-6683.
[18] ZHAO Y,LV W,XU S,et al.Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
[19] GUAN Y,LIAO S,YANG W.AParC-DETR:accelerate DETR training by introducing adaptive position-aware circular convolution[J].The Visual Computer,2025,41(2):1319-1333.
[1] ZHANG Shouyi, SHEN Qiang, GUO Yiran, WANG Hanyu. Rain and Fog Weather Object Detection Algorithm Based on Improved YOLOv8 Model [J]. Computer Science, 2026, 53(6A): 250300090-7.
[2] LIU Dai, AN Pengyu, WANG Kai. Improved YOLOv5s-based Algorithm for Emergency Situation Detection in Airport Terminals [J]. Computer Science, 2026, 53(6A): 250300174-7.
[3] MAO Lihong, TANG Jianjun, CHEN Tong, ZHANG Rui. Aerial Image Object Detection Model Based on Dual-domain Attention and Feature Fusion [J]. Computer Science, 2026, 53(6A): 250600036-7.
[4] SHAN Chengcheng, MEI Chun, LI Weiting, GUO Yuanyuan, QIAN Weixing, XIONG Zhi. Semantic Perception Active Learning Method for the Datum Map of Scene Matching Navigation System [J]. Computer Science, 2026, 53(6A): 250600228-8.
[5] CHEN Nuo, ZHAO Peng, HUAN Haisheng. Review of Small Object Detection Based on Deep Learning [J]. Computer Science, 2026, 53(6A): 250700022-9.
[6] ZHENG Haibin, LIN Xiuhao, HAN Ye, CHEN Jinyin, LI Beibei. Black-box Physical Adversarial Attack Against Multimodal Object Detector [J]. Computer Science, 2026, 53(6A): 250700023-10.
[7] DONG Ye, LIAN Xinyue, WANG Yuyang, OU Xinyu. RGB-IR Multi-modal Fusion-based Tomato Small Object Detection [J]. Computer Science, 2026, 53(6A): 250700173-8.
[8] ZHOU Wenwu, LEI Lei, XUAN Xin. Armory Equipment Detection Based on Improved YOLOv5 [J]. Computer Science, 2026, 53(6A): 250800049-6.
[9] JI Wenyu, LI Yang, WANG Jiabao, FU Ruizhi, LIU Xiaoyu, MIAO Zhuang. Review of 3D Object Detection Based on LiDAR-camera Fusion [J]. Computer Science, 2026, 53(6): 214-231.
[10] LI Peng, ZHANG Zihao, HAN Yahong. Primitive Dynamic Weighting for Multi-modal Salient Object Detection [J]. Computer Science, 2026, 53(6): 242-251.
[11] LIU Jikang, HUANG Lei, ZHANG Ke, NIE Jie, WEI Zhiqiang. Object Detection Method Based on Dynamic Feature Fusion [J]. Computer Science, 2026, 53(6): 263-269.
[12] SONG Jianhua, LIU Chun, ZHANG Yan. Lightweight Camouflaged Object Detection Model Based on Structured Knowledge Distillation [J]. Computer Science, 2026, 53(4): 299-307.
[13] ZHAO Binbei, ZHU Li, ZHAO Hongli, LI Yutong. Computer Vision Applications in Rail Transit Systems [J]. Computer Science, 2026, 53(3): 214-224.
[14] HUANG Jing, WANG Teng, LIU Jian, HU Kai, PENG Xin, HUANG Yamin, WEN Yuanqiao. Multimodal Visual Detection for Underwater Sonar Target Images [J]. Computer Science, 2026, 53(2): 227-235.
[15] LIU Chenhong, LI Fenglian, YANG Jia, WANG Suzhe, CHEN Guijun. Boundary-focused Multi-scale Feature Fusion Network for Stroke Lesion Segmentation [J]. Computer Science, 2026, 53(2): 264-272.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!