Computer Science ›› 2025, Vol. 52 ›› Issue (4): 202-211.doi: 10.11896/jsjkx.240500042

• Computer Graphics & Multimedia • Previous Articles     Next Articles

An Improved YOLOv8 Object Detection Algorithm for UAV Aerial Images

HU Huijuan1, QIN Yifeng1, XU He1,2and LI Peng1,2   

  1. 1 School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 Jiangsu HPC and Intelligent Processing Engineer Research Center,Nanjing 210023,China
  • Received:2024-05-10 Revised:2025-02-17 Online:2025-04-15 Published:2025-04-14
  • About author:HU Huijuan,born in 1975,master,lecturer.Her main research interests include artificial intelligence and Internet of Things technology.
    XU He,born in 1985,Ph.D,professor,master supervisor,is a senior member of CCF(No.19957S).His main research interests include big data and Internet of Things technology.
  • Supported by:
    National Key Research and Development Program of China(2019YFB2103003).

Abstract: Aiming at the problems of diverse target scales,complex backgrounds,small target aggregation,and limited computing resources of drone platforms target detection of aerial images,an improved YOLOv8 target detection algorithm YOLOv8-CEBI is proposed.Firstly,a lightweight Context Guided module is introduced into the backbone network to significantly reduce the number of model parameters and computation.At the same time,a multi-scale attention mechanism EMA is introduced to capture fine-grained spatial information and improve the detection ability for small targets and complex backgrounds.Secondly,the weighted bidirectional feature pyramid network BIFPN is introduced to transform the neck,and the multi-scale feature fusion ability is enhanced under the premise of maintaining the parameter cost.Finally,the Inner-CIOU loss function is used to generate the auxi-liary bounding box to calculate the loss more accurately and accelerate the bounding box regression process.Experiments on the VisDrone dataset show that compared with the original YOLOv8s algorithm,the proposed method parameter amount is reduced by 51.3 %,the computation amount is reduced by 28.5 %,and the mAP50 is increased by 1.6 %.The proposed model ensures the improvement of accuracy and achieves a balance between reducing computing resources and ensuring accuracy.

Key words: Drones, Aerial images, Attention mechanism, Loss function, Lightweighting

CLC Number: 

  • TP391
[1]LENG J X,MO M C,ZHOU Y H,et al.Research progress on target detection from the perspective of UAV [J].Chinese Journal of Image and Graphics,2023,28(9):2563-2586.
[2]DUAN Z J,LI S B,HU J J,et al.Review of Deep LearningBased Object Detection Methods and Their Mainstream Frameworks[J].Laser & Optoelectronics Progress,2020(12):59-74.
[3]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Columbus:IEEE,2014:580-587.
[4]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[5]REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:To-wards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[6]HE K M,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[J].arXiv:1703.06870,2017.
[7]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Proceedings of the 14th European Conference on Computer Vision(ECCV).Amsterdam:Springer,2016:21-37.
[8]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition(CVPR).Las Vegas:IEEE,2016:779-788.
[9]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision & Pattern Recognition.IEEE,2017:6517-6525.
[10]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767,2018.
[11]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLOv4:Optimal Speed and Accuracy of Object Detection[J].arXiv:2004.10934,2020.
[12]ZHU X,LYU S,WANG X,et al.TPH-YOLOv5:ImprovedYOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios[J].arXiv:2108.11539,2021.
[13]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV).Venice:IEEE,2017:2980-2988.
[14]WANG J,ZHANG F,ZHANG Y,et al.Lightweight Object Detection Algorithm for UAV Aerial Imagery[J].Sensors,2023,23:5786.
[15]NIU Y,CHENG W,SHI C,et al.YOLOv8-CGRNet:A Lightweight Object Detection Network Leveraging Context Guidance and Deep Residual Learning[J].Electronics.2024,13(1):43.
[16]PAN X,CHEN Q B,HUANG A,et al.Small target detection algorithm for UAV aerial images based on improved YOLOX [J].Journal of Nanjing University of Posts and Telecommunications(Natural Science Edition),2024,44(1):90-100.
[17]GONG Y,YU X,DING Y,et al.Effective Fusion Factor in FPN for Tiny Object Detection[C]//Workshop on Applications of Computer Vision.IEEE,2021.
[18]LI Y,FAN Q,HUANG H,et al.A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition[J].Drones,2023,7(5):304.
[19]WU T,TANG S,ZHANG R,et al.CGNet:A Light-WeightContext Guided Network for Semantic Segmentation[J].IEEE Transactions on Image Processing,2021,30:1169-1179.
[20]OUYANG D,HE S,ZHANG G Z,et al.Efficient Multi-Scale Attention Module with Cross-Spatial Learning[C]//2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2023).Rhodes Island,Greece,2023:1-5.
[21]TAN M X,PANG R M,LE Q V.EfficientDet:Scalable and Efficient Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:10781-10790.
[22]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[23]ZHANG H,XU C,ZHANG S.Inner-IoU:More effective intersection over union loss with auxiliary bounding box [J].arXiv:2311.02877,2024.
[24]DU D W,ZHU P F,WEN L Y,et al.VisDrone-DET2019:The Vision Meets Drone Object Detection in Image Challenge Results[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop(ICCVW).Seoul:IEEE,2019:213-226.
[25]YAO P,SONKA M,CHEN D Z.U-Net v2:Rethinking the Skip Connections of U-Net for Medical Image Segmentation[J].ar-Xiv:2311.17791,2023.
[26]SUNKARA R,LUO T.No More Strided Convolutions or Pooling:A New CNN Building Block for Low-Resolution Images and Small Objects[J].Machine Learning and Knowledge Discovery in Databases,2023,13715:27.
[27]XU G P,LIAO W T,ZHANG X,et al.Haar wavelet downsampling:A simple but effective downsampling module for semantic segmentation[J].Pattern Recognition,2023,143:109819.
[28]WANG C Y,YEH I H,LIAO H Y M.YOLOv9:Learning What You Want to Learn Using Programmable Gradient Information[J].arXiv:2402.13616,2024.
[29]LU W,CHEN S B,TANG J,et al.A Robust Feature Downsampling Module for Remote-Sensing Visual Tasks[J].IEEE Transactions on Geoscience and Remote Sensing,2023,61:1-12.
[30]WANG C C,HE W,NIE Y,et al.Gold-yolo:Efficient object detector via gather-and-distribute mechanism[J].arXiv:2309.11331,2023.
[31]DUAN K,BAI S,XIE L,et al.Centernet:Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6569-6578.
[32]CAI Z,VASCONCELOS N.Cascade R-CNN:Delving into highquality object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6154-6162.
[33]LV W,XU S,ZHAO Y,et al.Detrs beat yolos on real-time object detection[J].arXiv:2304.08069,2023.
[1] JIANG Wenwen, XIA Ying. Improved U-Net Multi-scale Feature Fusion Semantic Segmentation Network for RemoteSensing Images [J]. Computer Science, 2025, 52(5): 212-219.
[2] HE Chunhui, GE Bin, ZHANG Chong, XU Hao. Intelligent Error Correction Model for Chinese Idioms Fused with Fixed-length Seq2Seq Network [J]. Computer Science, 2025, 52(5): 227-234.
[3] HAN Daojun, LI Yunsong, ZHANG Juntao, WANG Zemin. Knowledge Graph Completion Method Fusing Entity Descriptions and Topological Structure [J]. Computer Science, 2025, 52(5): 260-269.
[4] PENG Linna, ZHANG Hongyun, MIAO Duoqian. Complex Organ Segmentation Based on Edge Constraints and Enhanced Swin Unetr [J]. Computer Science, 2025, 52(4): 177-184.
[5] KONG Jialin, ZHANG Qi, WEI Jianze, LI Qi. Adaptive Contextual Learning Method Based on Iris Texture Perception [J]. Computer Science, 2025, 52(4): 185-193.
[6] ZHENG Xiubao, LI Jing, ZHU Ming, NING Yingying. Selection Method for Cloud Manufacturing Industrial Services Based on Generative AdversarialNetworks [J]. Computer Science, 2025, 52(4): 54-63.
[7] WANG Xingbo, ZHANG Hao, GAO Hao, ZHAI Mingliang, XIE Jiucheng. Talking Portrait Synthesis Method Based on Regional Saliency and Spatial Feature Extraction [J]. Computer Science, 2025, 52(3): 58-67.
[8] ZHONG Yue, GU Jieming. 3D Reconstruction of Single-view Sketches Based on Attention Mechanism and Contrastive Loss [J]. Computer Science, 2025, 52(3): 77-85.
[9] CHENG Qinghua, JIAN Haifang, ZHENG Shuaikang, GUO Huimin, LI Yuehao. Illumination-aware Infrared/Visible Fusion for Object Detection [J]. Computer Science, 2025, 52(2): 173-182.
[10] LIU Yanlun, XIAO Zheng, NIE Zhenyu, LE Yuquan, LI Kenli. Case Element Association with Evidence Extraction for Adjudication Assistance [J]. Computer Science, 2025, 52(2): 222-230.
[11] ZHAO Qian, GUO Bin, LIU Yubo, SUN Zhuo, WANG Hao, CHEN Mengqi. Generation of Enrich Semantic Video Dialogue Based on Hierarchical Visual Attention [J]. Computer Science, 2025, 52(1): 315-322.
[12] LI Yunchen, ZHANG Rui, WANG Jiabao, LI Yang, WANG Ziqi, CHEN Yao. Re-parameterization Enhanced Dual-modal Realtime Object Detection Model [J]. Computer Science, 2024, 51(9): 162-172.
[13] HU Pengfei, WANG Youguo, ZHAI Qiqing, YAN Jun, BAI Quan. Night Vehicle Detection Algorithm Based on YOLOv5s and Bistable Stochastic Resonance [J]. Computer Science, 2024, 51(9): 173-181.
[14] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[15] LI Zhe, LIU Yiyang, WANG Ke, YANG Jie, LI Yafei, XU Mingliang. Real-time Prediction Model of Carrier Aircraft Landing Trajectory Based on Stagewise Autoencoders and Attention Mechanism [J]. Computer Science, 2024, 51(9): 273-282.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!