Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 250100139-11.doi: 10.11896/jsjkx.250100139

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Real-time Transformer Small Target Detection Model Based on Feature Extraction Enhancement and Pyramid Structure

ZHANG Wei1,2,3, CAI Yufan1, YE Lintao1, LIU Dazhi1   

  1. 1 College of Artificial Intelligence,Hubei University,Wuhan 430062,China
    2 Key Laboratory of Intelligent Perception Systems and Security of Ministry of Education,Wuhan 430062,China
    3 Hubei Provincial Engineering Research Center for Smart Government Affairs and Artificial Intelligence Application,Wuhan 430062,China
  • Online:2025-11-15 Published:2025-11-10
  • About author:ZHANG Wei,born in 1979,associate professor,master’s supervisor,is a member of CCF(No.Y8013M).His main research interests include compu-ter vision,image processing and artificial intelligence.
    CAI Yufan,born in 1998,postgraduate,is a member of CCF(No.Y4416G).His main research interests include object detection and image processing.
  • Supported by:
    National Natural Science Foundation of China(62273135).

Abstract: To address the challenges in small target detection in outdoor environment,such as complex background,insufficient light,dense target and severe occlusion,an improved LDSD-DETR model based on real-time detection Transformer is proposed to enhance feature extraction and small target detection capability in complex background.In order to improve the efficiency of feature extraction,linear deformable convolution(LDConv) is used to improve the pooling layer and the subsampling part to extract features more effectively.Deformable attention mechanism is introduced into the attention-based feature interaction part of the scale to optimize the feature capture of the relevant regions of the target.For small target detection,a small target enhancement pyramid is designed in the cross-scale feature fusion part to enhance the sensitivity of small target.To further improve perfor-mance,the reconstructed structure combines DGCST modules to effectively capture both local and global features of the image.The experimental results show that the average detection accuracy of LDSD-DETR on Roboflow100 and its extended data set is better than other test models.Compared with the original model,all indexes are effectively improved,among which mAP50 is increased to 90%,an increase of 1.8 percentage points.In addition,the model is optimized in terms of computation amount,parameter number and weight file size,which provides a more accurate and efficient solution for real-time detection of small targets.

Key words: Object detection, Small target, RT-DETR, Feature extraction, Pyramid structure, Transformer

CLC Number: 

  • TP391
[1]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60:91-110.
[2]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2005:886-893.
[3]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:580-587.
[4]GIRSHICK R.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision(ICCV).IEEE,2015:1440-1448.
[5]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[6]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer,2016:21-37.
[7]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2016:779-788.
[8]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6517-6525.
[9]FARHADI A,REDMON J.Yolov3:An incremental improve-ment[C]//Computer Vision and Pattern Recognition.Berlin:Springer,2018:1-6.
[10]SALSCHEIDER N O.Featurenms:Non-maximum suppression by learning feature embeddings[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:7848-7854.
[11]VASWANI A.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,30:5998-6008.
[12]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//Computer Vision-ECCV 2020.Cham:Springer,2020:213-229.
[13]ZHAO Y,LV W,XU S,et al.Detrs beat yolos on real-time ob-ject detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
[14]ZHANG X,SONG Y,SONG T,et al.LDConv:Linear deformable convolution for improving convolutional neural networks[J].Image and Vision Computing,2024,149:105190.
[15]XIA Z,PAN X,SONG S,et al.Vision transformer with deform-able attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4794-4803.
[16]GONG W.Lightweight Object Detection:A Study Based onYOLOv7 Integrated with ShuffleNetv2 and Vision Transformer[J].arxiv:2403.01736,2024.
[17]LIU M,DU H,ZHAO Y,et al.Image small target detectionbased on deep learning with SNR controlled sample generation[J].Current Trends in Computer Science and Mechanical Automation,2017,1:211-220.
[18]LU X,LI B,YUE Y,et al.Grid R-CNN[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:7355-7364.
[19]MCINTOSH B,VENKATARAMANAN S,MAHALANOBISA.Infrared Target Detection in Cluttered Environments by Maximization of a Target to Clutter Ratio(TCR) Metric Using a Convolutional Neural Network[J].IEEE Transactions on Aerospace and Electronic Systems,2021,57(1):485-496.
[20]TIAN Z,SHEN C,CHEN H,et al.FCOS:A Simple and Strong Anchor-Free Object Detector[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(4):1922-1933.
[21]TIAN Y,WANG S,LI E,et al.MD-YOLO:Multi-scale Dense YOLO for small target pest detection[J].Computers and Electronics in Agriculture,2023,213:108233.
[22]ABOAH A,WANG B,BAGCI U,et al.Real-time Multi-ClassHel-met Violation Detection Using Few-Shot Data Sampling Technique and YOLOv8[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2023:5350-5358.
[23]LI Y C,SHI W Y,FENG C.Lightweight YOLOv8 detection algorithm for small object detection in UAV aerial photography[J].Computer Engineering and Applications,2024,60(17):167-178.
[24]WANG H,LIU C,CAI Y,et al.YOLOv8-QSD:An Improved Small Object Detection Algorithm for Autonomous Vehicles Based on YOLOv8[J].IEEE Transactions on Instrumentation and Measurement,2024,73:1-16.
[25]DAI Z,CAI B,LIN Y,et al.UP-DETR:Unsupervised Pre-train-ing for Object Detection with Transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2021:1601-1610.
[26]MISRA I,GIRDHAR R,JOULIN A.An End-to-End Trans-former Model for 3D Object Detection[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2021:2886-2897.
[27]HUO D,KASTNER M A,LIU T,et al.Small object detection for birds with Swin transformer[C]//2023 18th International Conference on Machine Vision and Applications(MVA).IEEE,2023:1-5.
[28]WU J,JING R,BAI Y,et al.Small insulator defects detection based on multi-scale feature interaction transformer for UAV-assisted power IoVT[J].IEEE Internet of Things Journal,2024,11(13):23410-23427.
[29]JING M,ZHANG J.Research on Microscale Vehicle Logo Detection Based on Real-Time DEtection TRansformer(RT-DETR)[J].Sensors,2024,24(21):6987.
[30]YU C,SHIN Y.Object Detection in UAV Images Based on RT-DETR with CG Downsampling and CCFMP[C]//2024 IEEE VTS Asia Pacific Wireless Communications Symposium(APWCS).IEEE,2024:1-4.
[31]HUANG J,LI T.SMall object detection by DETR via information augmentation and adaptive feature fusion[C]//Proceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval.2024:39-44.
[32]DAI J,QI H,XIONG Y,et al.Deformable convolutional net-works[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:764-773.
[33]WANG C Y,LIAO H Y M,WU Y H,et al.CSPNet:A New Backbone that can Enhance Learning Capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2020:1571-1580.
[34]SUNKARA R,LUO T.No more strided convolutions or poo-ling:A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Sprin-ger,2022:443-459.
[35]CUI Y,REN W,KNOLL A.Omni-Kernel Network for ImageRestoration[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:1426-1434.
[36]CHEN J,KAO S,HE H,et al.Run,don’t walk:chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12021-12031.
[37]CIAGLIA F,ZUPPICHINI F S,GUERRIE P,et al.Roboflow 100:A rich,multi-domain object detection benchmark[J].arXiv:2211.13523,2022.
[38]WANG X Q,GAO H B,JIA Z M.Improved road defect detection algorithm of YOLOv8[J].Computer Engineering and Applications,2024,60(17):179-190.
[39]FU C,LIU R,FAN X,et al.Rethinking general underwater object detection:Datasets,challenges,and solutions[J].Neurocomputing,2023,517:243-256.
[40] SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-Based Localization[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017:618-626.
[1] ZHAO Binbei, ZHU Li, ZHAO Hongli, LI Yutong. Computer Vision Applications in Rail Transit Systems [J]. Computer Science, 2026, 53(3): 214-224.
[2] TANG Xinliang, PAN Xiaorun, WANG Jianchao, SU He. Integrate ByteTrack’s EAP-YOLOv8 UAV Marker Point Detection and Tracking [J]. Computer Science, 2026, 53(3): 266-276.
[3] YU Ding, LI Zhangwei. Prediction Method of RNA Secondary Structure Based on Transformer Architecture [J]. Computer Science, 2026, 53(3): 375-382.
[4] CHEN Han, XU Zefeng, JIANG Jiu, FAN Fan, ZHANG Junjian, HE Chu, WANG Wenwei. Large Language Model and Deep Network Based Cognitive Assessment Automatic Diagnosis [J]. Computer Science, 2026, 53(3): 41-51.
[5] LI Zequn, DING Fei. Fatigue Driving Detection Based on Dual-branch Fusion and Segmented Domain AdaptationTransfer Learning [J]. Computer Science, 2026, 53(3): 78-87.
[6] LI Jiahao, JING Junchang, XU Qian, LIU Dong. GTKT:Knowledge Tracing Model Integrating Connectivism Learning and Multi-layer TemporalGraph Transformer [J]. Computer Science, 2026, 53(2): 78-88.
[7] PAN Jian, WANG Xuhao. Time Series Forecasting Model Integrating Multi-scale Features and Attention Mechanism [J]. Computer Science, 2026, 53(2): 180-186.
[8] HUANG Jing, WANG Teng, LIU Jian, HU Kai, PENG Xin, HUANG Yamin, WEN Yuanqiao. Multimodal Visual Detection for Underwater Sonar Target Images [J]. Computer Science, 2026, 53(2): 227-235.
[9] GUO Xingxing, XIAO Yannan, WEN Peizhi, XU Zhi, HUANG Wenming. Attention-based Audio-driven Digital Face Video Generation Method [J]. Computer Science, 2026, 53(2): 245-252.
[10] LIU Chenhong, LI Fenglian, YANG Jia, WANG Suzhe, CHEN Guijun. Boundary-focused Multi-scale Feature Fusion Network for Stroke Lesion Segmentation [J]. Computer Science, 2026, 53(2): 264-272.
[11] WANG Cheng, JIN Cheng. KAN-based Unsupervised Multivariate Time Series Anomaly Detection Network [J]. Computer Science, 2026, 53(1): 89-96.
[12] ZHOU Bingquan, JIANG Jie, CHEN Jiangmin, ZHAN Lixin. EvR-DETR:Event-RGB Fusion for Lightweight End-to-End Object Detection [J]. Computer Science, 2026, 53(1): 153-162.
[13] LI Fangfang, KONG Yuqiu, LIU Yang , LI Pengyue. Co-salient Object Detection Guided by Category Labels [J]. Computer Science, 2026, 53(1): 163-172.
[14] LI Ang, ZHANG Jieyuan, LIU Xunyun. Camouflaged Object Detection for Aerial Images Based on Bidirectional Cross-attentionCross-domain Fusion [J]. Computer Science, 2026, 53(1): 173-179.
[15] FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!