Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 250100139-11.doi: 10.11896/jsjkx.250100139

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Real-time Transformer Small Target Detection Model Based on Feature Extraction Enhancement and Pyramid Structure

ZHANG Wei1,2,3, CAI Yufan1, YE Lintao1, LIU Dazhi1   

  1. 1 College of Artificial Intelligence,Hubei University,Wuhan 430062,China
    2 Key Laboratory of Intelligent Perception Systems and Security of Ministry of Education,Wuhan 430062,China
    3 Hubei Provincial Engineering Research Center for Smart Government Affairs and Artificial Intelligence Application,Wuhan 430062,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Natural Science Foundation of China(62273135).

Abstract: To address the challenges in small target detection in outdoor environment,such as complex background,insufficient light,dense target and severe occlusion,an improved LDSD-DETR model based on real-time detection Transformer is proposed to enhance feature extraction and small target detection capability in complex background.In order to improve the efficiency of feature extraction,linear deformable convolution(LDConv) is used to improve the pooling layer and the subsampling part to extract features more effectively.Deformable attention mechanism is introduced into the attention-based feature interaction part of the scale to optimize the feature capture of the relevant regions of the target.For small target detection,a small target enhancement pyramid is designed in the cross-scale feature fusion part to enhance the sensitivity of small target.To further improve perfor-mance,the reconstructed structure combines DGCST modules to effectively capture both local and global features of the image.The experimental results show that the average detection accuracy of LDSD-DETR on Roboflow100 and its extended data set is better than other test models.Compared with the original model,all indexes are effectively improved,among which mAP50 is increased to 90%,an increase of 1.8 percentage points.In addition,the model is optimized in terms of computation amount,parameter number and weight file size,which provides a more accurate and efficient solution for real-time detection of small targets.

Key words: Object detection, Small target, RT-DETR, Feature extraction, Pyramid structure, Transformer

CLC Number: 

  • TP391
[1]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60:91-110.
[2]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2005:886-893.
[3]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:580-587.
[4]GIRSHICK R.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision(ICCV).IEEE,2015:1440-1448.
[5]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[6]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer,2016:21-37.
[7]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2016:779-788.
[8]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6517-6525.
[9]FARHADI A,REDMON J.Yolov3:An incremental improve-ment[C]//Computer Vision and Pattern Recognition.Berlin:Springer,2018:1-6.
[10]SALSCHEIDER N O.Featurenms:Non-maximum suppression by learning feature embeddings[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:7848-7854.
[11]VASWANI A.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,30:5998-6008.
[12]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//Computer Vision-ECCV 2020.Cham:Springer,2020:213-229.
[13]ZHAO Y,LV W,XU S,et al.Detrs beat yolos on real-time ob-ject detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
[14]ZHANG X,SONG Y,SONG T,et al.LDConv:Linear deformable convolution for improving convolutional neural networks[J].Image and Vision Computing,2024,149:105190.
[15]XIA Z,PAN X,SONG S,et al.Vision transformer with deform-able attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4794-4803.
[16]GONG W.Lightweight Object Detection:A Study Based onYOLOv7 Integrated with ShuffleNetv2 and Vision Transformer[J].arxiv:2403.01736,2024.
[17]LIU M,DU H,ZHAO Y,et al.Image small target detectionbased on deep learning with SNR controlled sample generation[J].Current Trends in Computer Science and Mechanical Automation,2017,1:211-220.
[18]LU X,LI B,YUE Y,et al.Grid R-CNN[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:7355-7364.
[19]MCINTOSH B,VENKATARAMANAN S,MAHALANOBISA.Infrared Target Detection in Cluttered Environments by Maximization of a Target to Clutter Ratio(TCR) Metric Using a Convolutional Neural Network[J].IEEE Transactions on Aerospace and Electronic Systems,2021,57(1):485-496.
[20]TIAN Z,SHEN C,CHEN H,et al.FCOS:A Simple and Strong Anchor-Free Object Detector[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(4):1922-1933.
[21]TIAN Y,WANG S,LI E,et al.MD-YOLO:Multi-scale Dense YOLO for small target pest detection[J].Computers and Electronics in Agriculture,2023,213:108233.
[22]ABOAH A,WANG B,BAGCI U,et al.Real-time Multi-ClassHel-met Violation Detection Using Few-Shot Data Sampling Technique and YOLOv8[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2023:5350-5358.
[23]LI Y C,SHI W Y,FENG C.Lightweight YOLOv8 detection algorithm for small object detection in UAV aerial photography[J].Computer Engineering and Applications,2024,60(17):167-178.
[24]WANG H,LIU C,CAI Y,et al.YOLOv8-QSD:An Improved Small Object Detection Algorithm for Autonomous Vehicles Based on YOLOv8[J].IEEE Transactions on Instrumentation and Measurement,2024,73:1-16.
[25]DAI Z,CAI B,LIN Y,et al.UP-DETR:Unsupervised Pre-train-ing for Object Detection with Transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2021:1601-1610.
[26]MISRA I,GIRDHAR R,JOULIN A.An End-to-End Trans-former Model for 3D Object Detection[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2021:2886-2897.
[27]HUO D,KASTNER M A,LIU T,et al.Small object detection for birds with Swin transformer[C]//2023 18th International Conference on Machine Vision and Applications(MVA).IEEE,2023:1-5.
[28]WU J,JING R,BAI Y,et al.Small insulator defects detection based on multi-scale feature interaction transformer for UAV-assisted power IoVT[J].IEEE Internet of Things Journal,2024,11(13):23410-23427.
[29]JING M,ZHANG J.Research on Microscale Vehicle Logo Detection Based on Real-Time DEtection TRansformer(RT-DETR)[J].Sensors,2024,24(21):6987.
[30]YU C,SHIN Y.Object Detection in UAV Images Based on RT-DETR with CG Downsampling and CCFMP[C]//2024 IEEE VTS Asia Pacific Wireless Communications Symposium(APWCS).IEEE,2024:1-4.
[31]HUANG J,LI T.SMall object detection by DETR via information augmentation and adaptive feature fusion[C]//Proceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval.2024:39-44.
[32]DAI J,QI H,XIONG Y,et al.Deformable convolutional net-works[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:764-773.
[33]WANG C Y,LIAO H Y M,WU Y H,et al.CSPNet:A New Backbone that can Enhance Learning Capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2020:1571-1580.
[34]SUNKARA R,LUO T.No more strided convolutions or poo-ling:A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Sprin-ger,2022:443-459.
[35]CUI Y,REN W,KNOLL A.Omni-Kernel Network for ImageRestoration[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:1426-1434.
[36]CHEN J,KAO S,HE H,et al.Run,don’t walk:chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12021-12031.
[37]CIAGLIA F,ZUPPICHINI F S,GUERRIE P,et al.Roboflow 100:A rich,multi-domain object detection benchmark[J].arXiv:2211.13523,2022.
[38]WANG X Q,GAO H B,JIA Z M.Improved road defect detection algorithm of YOLOv8[J].Computer Engineering and Applications,2024,60(17):179-190.
[39]FU C,LIU R,FAN X,et al.Rethinking general underwater object detection:Datasets,challenges,and solutions[J].Neurocomputing,2023,517:243-256.
[40] SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-Based Localization[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017:618-626.
[1] HU Hailong, XU Xiangwei, LI Yaqian. Drug Combination Recommendation Model Based on Dynamic Disease Modeling [J]. Computer Science, 2025, 52(9): 96-105.
[2] DENG Jiayan, TIAN Shirui, LIU Xiangli, OUYANG Hongwei, JIAO Yunjia, DUAN Mingxing. Trajectory Prediction Method Based on Multi-stage Pedestrian Feature Mining [J]. Computer Science, 2025, 52(9): 241-248.
[3] DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer [J]. Computer Science, 2025, 52(8): 188-194.
[4] LIU Huayong, XU Minghui. Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss [J]. Computer Science, 2025, 52(8): 204-213.
[5] SHEN Tao, ZHANG Xiuzai, XU Dai. Improved RT-DETR Algorithm for Small Object Detection in Remote Sensing Images [J]. Computer Science, 2025, 52(8): 214-221.
[6] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[7] HUANG Xingyu, WANG Lihui, TANG Kun, CHENG Xinyu, ZHANG Jian, YE Chen. EFormer:Efficient Transformer for Medical Image Registration Based on Frequency Division and Board Attention [J]. Computer Science, 2025, 52(7): 151-160.
[8] XU Yongwei, REN Haopan, WANG Pengfei. Object Detection Algorithm Based on YOLOv8 Enhancement and Its Application Norms [J]. Computer Science, 2025, 52(7): 189-200.
[9] WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[10] LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[11] LI Mengxi, GAO Xindan, LI Xue. Two-way Feature Augmentation Graph Convolution Networks Algorithm [J]. Computer Science, 2025, 52(7): 127-134.
[12] LONG Xiao, HUANG Wei, HU Kai. Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification [J]. Computer Science, 2025, 52(6A): 240700183-6.
[13] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[14] LIU Yuanhong, WU Yubin. Local Linear Embedding Algorithm Based on Probability Model and Information Entropy [J]. Computer Science, 2025, 52(6A): 240500021-8.
[15] WANG Xuejian, WANG Yiheng, SUN Xinpo, LIU Chuan, JIA Ming, ZHAO Chao, YANG Chao. Extraction of Crustal Deformation Anomalies Based on Transformer-Isolation Forest [J]. Computer Science, 2025, 52(6A): 240600155-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!