Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250700022-9.doi: 10.11896/jsjkx.250700022

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Review of Small Object Detection Based on Deep Learning

CHEN Nuo, ZHAO Peng, HUAN Haisheng   

  1. College of Operational Support,Rocket Force University of Engineering,Xi'an 710025,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:CHEN Nuo,born in 1999,postgraduate.Her main research interest is small target detection from the perspective of UAV.
    ZHAO Peng,born in 1979,Ph.D,asso-ciate professor.His main research in-terests include intelligent information processing,recommendation systems,and distributed computing.

Abstract: As a key difficulty and important branch in the field of object detection,small object detection has long been a research hotspot in computer vision due to its characteristics such as tiny target size,blurred features,and vulnerability to background interference.In recent years,the rapid development of convolutional neural networks has significantly improved the performance of small object detection.This paper comprehensively reviews deep learning methods for object detection.It summarizes the relevant challenges in small object detection,mainly including the loss of spatial information during feature extraction,the lack of available information of the target itself,and the weak generalization ability of the model caused by the insufficient number of annotated samples.Subsequently,aiming at the above problems,this paper focuses on analyzing the methods and optimization strategies for small object detection.Secondly,this paper focuses on typical application scenarios such as autonomous driving,UAV detection,and medical imaging,and discusses in detail the practical applications and innovative achievements of related detection methods.Finally,it looks forward to the future research directions of small object detection,pointing out the direction for subsequent research work.

Key words: Small target detection, Deep learning, Object detection, Computer vision

CLC Number: 

  • TP379
[1] MENG B,SHI W.Small traffic sign recognition method based on improved YOLOv7[J].Scientific Reports,2025,15(1):5482.
[2] YANG J,ZHANG H,ZHOU Y,et al.Improved DAB-DETR model for irregular traffic obstacles detection in vision based driving environment perception scenario[J].Applied Intelligence,2025,55(7):541.
[3] WANG M,XU R,DENG W,et al.Implicit face model:Depth super-resolution for 3D face recognition[J].Pattern Recognition,2025,162:111353.
[4] ALANSARI M,ALNUAIMI K,GANAPATHI I,et al.Effi-cientFaceV2S:A lightweight model and a benchmarking approach for drone-captured face recognition[J].Expert Systems with Applications,2025,273:126786.
[5] HUANG S,REN S,WU W,et al.Discriminative features en-hancement for low-altitude UAV object detection[J].Pattern Recognition,2024,147:110041.
[6] HOU T,LENG C,WANG J,et al.MFEL-YOLO for small object detection in UAV aerial images[J].Expert Systems with Applications,2025,291:128459.
[7] GHOSH S,DAS S.Multi-scale morphology-aided deep medical image segmentation[J].Engineering Applications of Artificial Intelligence,2024,137:109047.
[8] FIAZ M,NOMAN M,CHOLAKKAL H,et al.Guided-attention and gated-aggregation network for medical image segmentation[J].Pattern Recognition,2024,156:110812.
[9] BAEK S,KIM J,YI K.Robust tracking and detection based on radar camera fusion filtering in urban autonomous driving[J].Intelligent Service Robotics,2024,17(6):1125-1141.
[10] HUANG M,HOU C,ZHENG X,et al.Multi-resolution feature perception network for UAV person re-identification[J].Multimedia Tools and Applications,2024,83(23):62559-62580.
[11] ZOU Z,CHEN K,SHI Z,et al.Object Detection in 20 Years:A Survey[J].Proceedings of the IEEE,2023,111(3):257-276.
[12] LIU P,WANG Q,ZHANG H,et al.A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s[J].Remote Sensing,2023,15(9):2429.
[13] ZHOU Y,QIAN H.Real-time object detection method withsingle-domain generalization based on YOLOv8[J].Journal of Real-Time Image Processing,2024,21(6):1-12.
[14] ZHAO J,TIAN G,QIU C,et al.Weed Detection in PotatoFields Based on Improved YOLOv4:Optimal Speed and Accuracy of Weed Detection in Potato Fields[J].Electronics,2022,11(22):3709.
[15] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Columbus:IEEE Press,2014:580-587.
[16] GIRSHICK R.Fast R-CNN[C]//Proceeding of IEEE International Conference on Computer Vision.Santiago:IEEE Press,2015:1440-1448.
[17] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[18] ZHANG H,SHAO F,CHU W,et al.Faster R-CNN based on frame difference and spatiotemporal context for vehicle detection[J].Signal,Image and Video Processing,2024,18(10):7013-7027.
[19] REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceeding of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press 2016:779-788.
[20] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Proceeding of European Conference on Computer Vision.Amsterdam,The Netherlands:Springer,2016:21-37.
[21] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Proceeding of IEEE International Confe-rence on Computer Vision.Honolulu:IEEE Press,2017:2999-3007.
[22] KHADIDOS A O,YAFOZ A.Leveraging retinanet based object detection model for assisting visually impaired individuals with metaheuristic optimization algorithm[J].Scientific Reports,2025,15(1):15979.
[23] BOUAFIA Y,ALLILI M S,HEBBACHE L,et al.SES-ReNet:Lightweight deep learning model for human detection in hazy weather conditions[J].Signal Processing:Image Communication,2025,130:117223.
[24] FU Q,LIU X,YAN Y,et al.Mixed image detection method of belt coal blockage and leakage based on improved RetinaNet mode[J].Discover Applied Sciences,2025,7(6):520.
[25] LAW H,DENG J.CornerNet:Detecting Objects as Paired Keypoints[C]//Proceedings of the European Conference on Computer Vision.Munich:Springer Press,2018:765-781.
[26] TIAN Z,SHEN C,CHEN H,et al.FCOS:Fully Convolutional One-Stage Object Detection[C]//Proceedings of IEEE International Conference on Computer Vision.Seoul:IEEE Press,2019:9626-9635.
[27] WU Y,WANG J,LI H,et al.Multimodal feature adaptive fusion for anchor-free 3D object detection[J].Applied Intelligence,2025,55(7):612.
[28] DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.2009:248-255.
[29] EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The Pascal Visual Object Classes(VOC) Challenge[J].International Journal of Computer Vision,2010,88(2):303-338.
[30] LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects inContext[C]//Proceedings of the European Conference on Computer Vision.Zurich,Switzerland:Springer,2014:740-755.
[31] LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//Proceeding of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE Press,2017:936-944.
[32] LU Y,CHENG C,ZHAO D,et al.Infrared small target detection algorithm based on nested FPN and interference suppression[J].Expert Systems with Applications,2025,274:127029.
[33] LI Z,HE Q,YANG W.E-FPN:an enhanced feature pyramid network for UAV scenarios detection[J].The Visual Compu-ter,2025,41(1):675-693.
[34] GAO T,XIA S,LIU M,et al.MSNet:Multi-Scale Network for Object Detection in Remote Sensing Images[J].Pattern Recognition,2025,158:110983.
[35] QIAN Y,WANG Q,WU C,et al.Apply prior feature integration to sparse object detectors[J].Pattern Recognition,2025,159:111103.
[36] SHAO J,ZHANG H,MIAO J.GPLM:Enhancing underwaterimages with Global Pyramid Linear Modulation[J].Image and Vision Computing,2025,154:105361.
[37] JAYASURYA S,GEETHA S,ABDULLAH A S,et al.UWE-Net:A Deep Learning Framework for Underwater Image Enhancement Integrating CBAM and Charbonnier Loss[J].Procedia Computer Science,2025,258:689-698.
[38] XIANG T,YANG J,CAI S,et al.Edge-awareness and feature decoupling enhancement network for camouflaged object detection[J].The Visual Computer,2025,41:10171-10187.
[39] LI J,SONG H,LIU L,et al.MixFuse:An iterative mix-attention transformer for multi-modal image fusion[J].Expert Systems with Applications,2025,261:125427.
[40] XU Y,LU J,WANG C.YOLO-SOD:Improved YOLO SmallObject Detection[C]//Proceeding of Pacific Rim International Conference on Artificial Intelligence.Kyoto,Japan:Springer,2025:164-176.
[41] LI J,ZHENG C,CHEN P,et al.Small object detection in UAV imagery based on channel-spatial fusion cross attention[J].Signal,Image and Video Processing,2025,19(4):302.
[42] BAO D,ZHOU J,TUXWORTH G,et al.Hierarchical Context Learning of object components for unsupervised semantic segmentation[J].Pattern Recognition,2025,167:111713.
[43] LIU K,ZUO X,MA X,et al.Context-awareness and frequency-refinementnetwork for small object detection in aerial images[J].Signal,Image and Video Processing,2025,19(7):557.
[44] ZHANG Z,YANG B,LU Y.A Local context enhanced Consistency-aware Mamba-based Sequential Recommendation model[J].Information Processing & Management,2025,62(3):104076.
[45] LI W,CHEN B,GAO M,et al.Collaborative local-global con-text modeling for session-based recommendation[J].Information Processing & Management,2025,62(5):104196.
[46] JI R,WANG Q,WANG B,et al.Spatial-temporal context-aware network for 3D-Craft generation[J].Applied Intelligence,2025,55(7):579.
[47] YUAN Q.Building rooftop extraction from high resolution aerial images using multiscale global perceptron with spatial context refinement[J].Scientific Reports,2025,15(1):6499.
[48] DENG F,WANG S,YANG J.Region-Aware DiscriminativeLearning Gan for Super-Resolution Reconstruction of Infrared Imagery[J].Neurocomputing,2025,639:130202.
[49] AHMAD U,LIANG J,MA T,et al.Small aerial object detection through GAN-integrated feature pyramid networks[J].Applied Soft Computing,2025,171:112834.
[50] ASIF M,ABRAR M,ULLAH F,et al.A novel hybrid deeplearning approach for super-resolution and objects detection in remote sensing[J].Scientific Reports,2025,15(1):17221.
[51] SONG Y,SUN L,BI J,et al.DRGAN:A Detail Recovery-Based Model for Optical Remote Sensing Images Super-Resolution[J].IEEE Transactions on Geoscience and Remote Sensing,2025,63:1-13.
[52] LIAO J,GUHA T,SANCHEZ V.Self-supervised random mask attention GAN in tackling pose-invariant face recognition[J].Pattern Recognition,2025,159:111112.
[53] CHAKI J.Generative Adversarial Networks Based Image Aug-mentation[J].Deep Learning Image Augmentation,2025,121:21-57.
[54] JUNG K,SEO Y,CHO S,et al.DALDA:Data AugmentationLeveraging Diffusion Model and LLM with Adaptive Guidance Scaling[C]//Proceeding of European Conference on Computer Vision.Milan:Springer Press,2025:182-200.
[55] DING H,HUANG N,WU Y,et al.Improving imbalanced medical image classification through GAN-based data augmentation methods[J].Pattern Recognition,2025,166:111680.
[56] NARTENI S,ORANI V,FERRARI E,et al.Explainable evaluation of generative adversarial networks for wearables data augmentation[J].Engineering Applications of Artificial Intelligence,2025,145:110133.
[57] ZHANG N,LIU Y,LIU H,et al.Hierarchical Diffusion Models for Generating Various Pattern Vehicles in Infrared Aerial Images[J].Pattern Recognition,2025,16:111658.
[58] LI X,WU X,WANG T,et al.Fault diagnosis method for imbalanced data based on adaptive diffusion models and generative adversarial networks[J].Engineering Applications of Artificial Intelligence,2025,147:110410.
[59] LUO Y,LI X,CHEN S.Spatial-Temporal Aware-based Unsupervised Network for Infrared Small Target Detection[J].IEEE Transactions on Multimedia,2025,27:1-15.
[60] MEHTA S,RASTEGARI M.MobileViT:Light-weight,Gene-ral-purpose,and Mobile-friendly Vision Transformer[J].arXiv:2110.02178,2022.
[61] WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors[C]//Proceeding of IEEE Computer Vision and Pattern Recognition.Canada:IEEE Press,2023:7464-7475.
[62] CHEN J N,SUN S,HE J,et al.TransMix:Attend to Mix for Vision Transformers[C]//Proceeding of IEEE Computer Vision and Pattern Recognition.USA:IEEE Press,2022:12125-12134.
[63] ZHANG S,XIE T,WANG Y,et al.SF-YOLO:RGB-T Fusion Object Detection in UAV Scenes[C]//Proceeding of IEEE International Conference on Image,Vision and Computing.Dalian:IEEE Press,2023:51-59.
[64] LIU S,ZHA J,SUN J,et al.EdgeYOLO:An Edge-Real-TimeObject Detector[C]//Proceeding of IEEE Chinese Control Conference.Tianjin:IEEE Press,2023:7507-7512.
[65] ZHU X,LYU S,WANG X,et al.TPH-YOLOv5:ImprovedYOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios[C]//Proceeding of IEEE International Conference on Computer Vision Workshops.Canada:IEEE Press,2021:2778-2788.
[66] LI Z,HE Q,YANG W.E-FPN:an enhanced feature pyramid network for UAV scenarios detection[J].The Visual Compu-ter,2025,41(1):675-693.
[67] YI S,LIU X,LI J,et al.UAVformer:A Composite Transformer Network for Urban Scene Segmentation of UAV Images[J].Pattern Recognition,2023,133:109019.
[68] DONG Y,GUO J,XU F.Cross-YOLO:an object detection algorithm for UAV based on improved YOLOv8 model[J].Signal,Image and Video Processing,2025,19(6):489.
[69] ZHOU L,ZHAO S,LI S,et al.A lightweight object detectionmethod based on fine-grained information extraction and exchange in UAV aerial images[J].Knowledge-Based Systems,2025,315:113253.
[70] QU J,LI Q,PAN J,et al.SS-YOLOv8:small-size object detection algorithm based on improved YOLOv8 for UAV imagery[J].Multimedia Systems,2025,31(1):42.
[71] LIN X,NIU Y,YU X,et al.Paying more attention on backgrounds:Background-centric attention for UAV detection[J].Neural Networks,2025,185:107182.
[72] ZHAO B,ZHOU Y,SONG R,et al.ModularYOLOv8 optimization for real-time UAV maritime rescue object detection[J].Scientific Reports,2024,14(1):24492.
[73] FAN J,ZHANG X,ZOU Y,et al.Improving policy training for autonomous driving through randomized ensembled double Q-learning with Transformer encoder feature evaluation[J].Applied Soft Computing,2024,167:112386.
[74] LU Y,HE X,ZHANG Q,et al.Fast stereo conformer:Real-time stereo matching with enhanced feature fusion for autonomous driving[J].Engineering Applications of Artificial Intelligence,2025,149:110565.
[75] LIU Z,WU J,CAI Y,et al.Dual-stage feature specialization network for robust visual object detection in autonomous vehicles[J].Scientific Reports,2025,15(1):15501.
[76] ZHANG L,YANG K,HAN Y,et al.TSD-DETR:A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving[J].Engineering Applications of Artificial Intelligence,2025,139:109536.
[77] CHEN Y,LUO H.VisioSignNet:A Dual-Interactive NeuralNetwork for enhanced traffic sign detection[J].Expert Systems with Applications,2024,255:124688.
[78] LI Z,WU J,LIN S,et al.Progressive Enhancement Dehazing for object detection in extreme weather[J].Engineering Applications of Artificial Intelligence,2025,155:110903.
[79] RASHM I,CHAUDHRY R.SD-YOLO-AWDNet:A hybrid approach for smart object detection in challenging weather for self-driving cars[J].Expert Systems with Applications,2024,256:124942.
[80] HAN Z,YE Z,LIANG R,et al.A lightweight method for precise small lesion detection in diabetic retinopathy[J].Biomedical Signal Processing and Control,2025,109:108006.
[81] RUSSO C,BRIA A,MARROCCO C.GravityNet for end-to-end small lesion detection[J].Artificial Intelligence in Medicine,2024,150:102842.
[82] WU H,XU Q,HE X,et al.SPE-YOLO:A deep learning model focusing on small pulmonary embolism detection[J].Computers in Biology and Medicine,2025,184:109402.
[83] FAN P,DIAO Y,LI F,et al.SRSegNet:Super-resolution-assis-ted small targets polyp segmentation network with combined high and low resolution[J].Journal of King Saud University Computer and Information Sciences,2024,36(3):101981.
[84] ZHU Z,LIANG D,ZHANG S,et al.Traffic-Sign Detection and Classification in the Wild[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Press,2016:2110-2118.
[85] LI K,WAN G,CHENG G,et al.Object detection in optical remote sensing images:A survey and a new benchmark[J].ISPRS Journal of Photogrammetry and Remote Sensing,2020,159:296-307.
[86] YU X,GONG Y,JIANG N,et al.Scale Match for Tiny Person Detection[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision.Snowmass:IEEE Press,2020:1246-1254.
[87] SON B,OH Y,BAEK D,et al.FYI:Flip Your Images for Dataset Distillation[C]//Proceedings of the European Conference on Computer Vision.Milan:Springer,2025:214-230.
[88] FAN C,ZHANG Y,MA H,et al.A novel lightweight DDPM-based data augmentation method for rotating machinery fault diagnosis with small sample[J].Mechanical Systems and Signal Processing,2025,232:112741.
[89] TENG S,LIU A,CHEN B,et al.Unsupervised learning method for underwater concrete crack image enhancement and augmentation based on cross domain translation strategy[J].Engineering Applications of Artificial Intelligence,2024,136:108884.
[90] LIU D,CAO Y,YANG J,et al.SM-CycleGAN:crop image data enhancement method based on self-attention mechanism CycleGAN[J].Scientific Reports,2024,14(1):9277.
[91] MA S,ZHU X,XU L,et al.LRNet:lightweight attention-oriented residual fusion network for light field salient object detection[J].Scientific Reports,2024,14(1):26030.
[92] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].arXiv:1704.04861,2017.
[93] ZHANG X,ZHOU X,LIN M,et al.ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[J].arXiv:1707.01083,2017.
[94] CHEN Z,LU C,WANG Y.Cieg-Net:Context Information Enhanced Gated Network for Multimodal Sentiment Analysis[J].Pattern Recognition,2025,168:111785
[95] YU S,WANG X,CI Y,et al.MSPE-Fusion:A multimodal 3D object detection method with multi-sensor perception enhanced fusion[J].Neurocomputing,2025,645:130486.
[1] ZHANG Shouyi, SHEN Qiang, GUO Yiran, WANG Hanyu. Rain and Fog Weather Object Detection Algorithm Based on Improved YOLOv8 Model [J]. Computer Science, 2026, 53(6A): 250300090-7.
[2] CHEN Di, YIN Jibin. Dynamic Adjustment Technology of Eye Movement Input Based on TCN-AttnRNN Model [J]. Computer Science, 2026, 53(6A): 250300095-7.
[3] WANG Baohui, TAN Yingjie , CHEN Jixuan. Occlusion Head Pose Estimation Algorithm Based on Riemann Optimization [J]. Computer Science, 2026, 53(6A): 250300109-9.
[4] CHU Chunyu, JIANG Feilong. Water Meter Reading Recognition Based on Deep Learning and Prior Correction [J]. Computer Science, 2026, 53(6A): 250300143-7.
[5] LIU Dai, AN Pengyu, WANG Kai. Improved YOLOv5s-based Algorithm for Emergency Situation Detection in Airport Terminals [J]. Computer Science, 2026, 53(6A): 250300174-7.
[6] WU Xiaoxiao, WU Xinglong. Prenatal Diagnosis of Fetal Cerebellum Based on Brain Anatomical Structures [J]. Computer Science, 2026, 53(6A): 250400049-7.
[7] ZHANG Xiaozhu, CHEN Hongyou, QU Lingfeng, WANG Yuechenjia, TIAN Baodan, FAN Yong. Carbon Emission Prediction Algorithm Based on TransLSTM-GAN Model [J]. Computer Science, 2026, 53(6A): 250400146-11.
[8] FU Yue, SHI Wei. Social Text MBTI Personality Feature Recognition Method Based on Data Fusion and Deep Learning [J]. Computer Science, 2026, 53(6A): 250500101-8.
[9] MAO Lihong, TANG Jianjun, CHEN Tong, ZHANG Rui. Aerial Image Object Detection Model Based on Dual-domain Attention and Feature Fusion [J]. Computer Science, 2026, 53(6A): 250600036-7.
[10] HUANG Haixin, HOU Guangshuai, HE Tianyu. SeguGAN:Research on Super-resolution Reconstruction of License Plate Images UtilizingGenerative Adversarial Networks [J]. Computer Science, 2026, 53(6A): 250600070-5.
[11] SU Ye, XU Xin, ZHAO Longlong, LI Xiaoli, CHEN Pan, CHEN Jinsong. LitchiNet:Lightweight Litchi Variety Recognition Network with Fused Multi-scale Gated Attention and Class Imbalance Awareness [J]. Computer Science, 2026, 53(6A): 250600127-8.
[12] LI Siyu, QIAN Wenhua. HCKD:Lightweight Skin Lesion Classification Method Based on Dermoscopic Images [J]. Computer Science, 2026, 53(6A): 250600143-9.
[13] HUANG Haixin, HE Tianyu, HOU Guangshuai. Multi-layer Graph Convolutional Action Recognition Method Based on Topological Information [J]. Computer Science, 2026, 53(6A): 250600147-5.
[14] SHAN Chengcheng, MEI Chun, LI Weiting, GUO Yuanyuan, QIAN Weixing, XIONG Zhi. Semantic Perception Active Learning Method for the Datum Map of Scene Matching Navigation System [J]. Computer Science, 2026, 53(6A): 250600228-8.
[15] ZHENG Haibin, LIN Xiuhao, HAN Ye, CHEN Jinyin, LI Beibei. Black-box Physical Adversarial Attack Against Multimodal Object Detector [J]. Computer Science, 2026, 53(6A): 250700023-10.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!