基于特征提取增强和金字塔结构的实时Transformer小目标检测模型

doi:10.11896/jsjkx.250100139

计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250100139-11.doi: 10.11896/jsjkx.250100139

• 计算机图形学&多媒体 • 上一篇下一篇

基于特征提取增强和金字塔结构的实时Transformer小目标检测模型

张伟^1,2,3, 蔡宇帆¹, 叶林涛¹, 刘大志¹

1 湖北大学人工智能学院武汉 430062
2 智能感知系统与安全教育部重点实验室武汉 430062
3 智慧政务与人工智能应用湖北省工程研究中心武汉 430062

出版日期:2025-11-15 发布日期:2025-11-10
通讯作者: 蔡宇帆(caiyufan0622@foxmail.com)
作者简介:zhang_wei@mail.hubu.edu
基金资助:
国家自然科学基金(62273135)

Real-time Transformer Small Target Detection Model Based on Feature Extraction Enhancement and Pyramid Structure

ZHANG Wei^1,2,3, CAI Yufan¹, YE Lintao¹, LIU Dazhi¹

1 College of Artificial Intelligence,Hubei University,Wuhan 430062,China
2 Key Laboratory of Intelligent Perception Systems and Security of Ministry of Education,Wuhan 430062,China
3 Hubei Provincial Engineering Research Center for Smart Government Affairs and Artificial Intelligence Application,Wuhan 430062,China

Online:2025-11-15 Published:2025-11-10
Supported by:
National Natural Science Foundation of China(62273135).

摘要/Abstract

摘要： 针对室外环境下小目标检测,如复杂背景、光照不足、目标密集和遮挡严重等挑战,提出了一种基于实时检测Transformer改进的模型LDSD-DETR,用于增强复杂背景下的特征提取及小目标检测能力。为提高特征提取效率,池化层和下采样部分采用线性可变形卷积(LDConv)进行改进,能更有效地提取特征,在基于注意力的尺度内特征交互部分引入可变形注意力机制,优化目标相关区域的特征捕捉。针对小目标检测,在跨尺度特征融合部分设计了小目标增强金字塔,增强了对小尺寸目标的敏感度。为了进一步提升性能,重构后的结构结合了DGCST模块,有效捕获图像的局部和全局特征。实验结果表明,LDSD-DETR在Roboflow100及其扩展数据集上的平均检测精度优于其他测试模型,相比原模型,各指标均有效提升,其中mAP50提升至90%,提高了1.8个百分点。此外,模型在计算量、参数量及权重文件大小方面均有所优化,为小目标的实时检测提供了更精确、高效的解决方案。

关键词: 目标检测, 小目标, RT-DETR, 特征提取, 金字塔结构, Transformer

Abstract: To address the challenges in small target detection in outdoor environment,such as complex background,insufficient light,dense target and severe occlusion,an improved LDSD-DETR model based on real-time detection Transformer is proposed to enhance feature extraction and small target detection capability in complex background.In order to improve the efficiency of feature extraction,linear deformable convolution(LDConv) is used to improve the pooling layer and the subsampling part to extract features more effectively.Deformable attention mechanism is introduced into the attention-based feature interaction part of the scale to optimize the feature capture of the relevant regions of the target.For small target detection,a small target enhancement pyramid is designed in the cross-scale feature fusion part to enhance the sensitivity of small target.To further improve perfor-mance,the reconstructed structure combines DGCST modules to effectively capture both local and global features of the image.The experimental results show that the average detection accuracy of LDSD-DETR on Roboflow100 and its extended data set is better than other test models.Compared with the original model,all indexes are effectively improved,among which mAP50 is increased to 90%,an increase of 1.8 percentage points.In addition,the model is optimized in terms of computation amount,parameter number and weight file size,which provides a more accurate and efficient solution for real-time detection of small targets.

Key words: Object detection, Small target, RT-DETR, Feature extraction, Pyramid structure, Transformer

中图分类号:

TP391

张伟, 蔡宇帆, 叶林涛, 刘大志. 基于特征提取增强和金字塔结构的实时Transformer小目标检测模型[J]. 计算机科学, 2025, 52(11A): 250100139-11. https://doi.org/10.11896/jsjkx.250100139

ZHANG Wei, CAI Yufan, YE Lintao, LIU Dazhi. Real-time Transformer Small Target Detection Model Based on Feature Extraction Enhancement and Pyramid Structure[J]. Computer Science, 2025, 52(11A): 250100139-11. https://doi.org/10.11896/jsjkx.250100139

参考文献

[1]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60:91-110.
[2]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2005:886-893.
[3]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:580-587.
[4]GIRSHICK R.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision(ICCV).IEEE,2015:1440-1448.
[5]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[6]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision－ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer,2016:21-37.
[7]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2016:779-788.
[8]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6517-6525.
[9]FARHADI A,REDMON J.Yolov3:An incremental improve-ment[C]//Computer Vision and Pattern Recognition.Berlin:Springer,2018:1-6.
[10]SALSCHEIDER N O.Featurenms:Non-maximum suppression by learning feature embeddings[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:7848-7854.
[11]VASWANI A.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,30:5998-6008.
[12]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//Computer Vision－ECCV 2020.Cham:Springer,2020:213-229.
[13]ZHAO Y,LV W,XU S,et al.Detrs beat yolos on real-time ob-ject detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:16965-16974.
[14]ZHANG X,SONG Y,SONG T,et al.LDConv:Linear deformable convolution for improving convolutional neural networks[J].Image and Vision Computing,2024,149:105190.
[15]XIA Z,PAN X,SONG S,et al.Vision transformer with deform-able attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4794-4803.
[16]GONG W.Lightweight Object Detection:A Study Based onYOLOv7 Integrated with ShuffleNetv2 and Vision Transformer[J].arxiv:2403.01736,2024.
[17]LIU M,DU H,ZHAO Y,et al.Image small target detectionbased on deep learning with SNR controlled sample generation[J].Current Trends in Computer Science and Mechanical Automation,2017,1:211-220.
[18]LU X,LI B,YUE Y,et al.Grid R-CNN[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:7355-7364.
[19]MCINTOSH B,VENKATARAMANAN S,MAHALANOBISA.Infrared Target Detection in Cluttered Environments by Maximization of a Target to Clutter Ratio(TCR) Metric Using a Convolutional Neural Network[J].IEEE Transactions on Aerospace and Electronic Systems,2021,57(1):485-496.
[20]TIAN Z,SHEN C,CHEN H,et al.FCOS:A Simple and Strong Anchor-Free Object Detector[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,44(4):1922-1933.
[21]TIAN Y,WANG S,LI E,et al.MD-YOLO:Multi-scale Dense YOLO for small target pest detection[J].Computers and Electronics in Agriculture,2023,213:108233.
[22]ABOAH A,WANG B,BAGCI U,et al.Real-time Multi-ClassHel-met Violation Detection Using Few-Shot Data Sampling Technique and YOLOv8[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2023:5350-5358.
[23]LI Y C,SHI W Y,FENG C.Lightweight YOLOv8 detection algorithm for small object detection in UAV aerial photography[J].Computer Engineering and Applications,2024,60(17):167-178.
[24]WANG H,LIU C,CAI Y,et al.YOLOv8-QSD:An Improved Small Object Detection Algorithm for Autonomous Vehicles Based on YOLOv8[J].IEEE Transactions on Instrumentation and Measurement,2024,73:1-16.
[25]DAI Z,CAI B,LIN Y,et al.UP-DETR:Unsupervised Pre-train-ing for Object Detection with Transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2021:1601-1610.
[26]MISRA I,GIRDHAR R,JOULIN A.An End-to-End Trans-former Model for 3D Object Detection[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2021:2886-2897.
[27]HUO D,KASTNER M A,LIU T,et al.Small object detection for birds with Swin transformer[C]//2023 18th International Conference on Machine Vision and Applications(MVA).IEEE,2023:1-5.
[28]WU J,JING R,BAI Y,et al.Small insulator defects detection based on multi-scale feature interaction transformer for UAV-assisted power IoVT[J].IEEE Internet of Things Journal,2024,11(13):23410-23427.
[29]JING M,ZHANG J.Research on Microscale Vehicle Logo Detection Based on Real-Time DEtection TRansformer(RT-DETR)[J].Sensors,2024,24(21):6987.
[30]YU C,SHIN Y.Object Detection in UAV Images Based on RT-DETR with CG Downsampling and CCFMP[C]//2024 IEEE VTS Asia Pacific Wireless Communications Symposium(APWCS).IEEE,2024:1-4.
[31]HUANG J,LI T.SMall object detection by DETR via information augmentation and adaptive feature fusion[C]//Proceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval.2024:39-44.
[32]DAI J,QI H,XIONG Y,et al.Deformable convolutional net-works[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:764-773.
[33]WANG C Y,LIAO H Y M,WU Y H,et al.CSPNet:A New Backbone that can Enhance Learning Capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).IEEE,2020:1571-1580.
[34]SUNKARA R,LUO T.No more strided convolutions or poo-ling:A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Sprin-ger,2022:443-459.
[35]CUI Y,REN W,KNOLL A.Omni-Kernel Network for ImageRestoration[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:1426-1434.
[36]CHEN J,KAO S,HE H,et al.Run,don’t walk:chasing higher FLOPS for faster neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12021-12031.
[37]CIAGLIA F,ZUPPICHINI F S,GUERRIE P,et al.Roboflow 100:A rich,multi-domain object detection benchmark[J].arXiv:2211.13523,2022.
[38]WANG X Q,GAO H B,JIA Z M.Improved road defect detection algorithm of YOLOv8[J].Computer Engineering and Applications,2024,60(17):179-190.
[39]FU C,LIU R,FAN X,et al.Rethinking general underwater object detection:Datasets,challenges,and solutions[J].Neurocomputing,2023,517:243-256.
[40] SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-Based Localization[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017:618-626.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于特征提取增强和金字塔结构的实时Transformer小目标检测模型

Real-time Transformer Small Target Detection Model Based on Feature Extraction Enhancement and Pyramid Structure

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0