计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 165-173.doi: 10.11896/jsjkx.230200030

• 计算机图形学&多媒体 • 上一篇    下一篇

多尺度特征融合的遥感图像目标检测方法

张洋, 夏英   

  1. 重庆邮电大学计算机科学与技术学院 重庆400065
  • 收稿日期:2023-02-06 修回日期:2023-03-10 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 夏英(xiaying@cqupt.edu.cn)
  • 作者简介:(s200231184@stu.cqupt.edu.cn)
  • 基金资助:
    国家自然科学基金(41871226);重庆市教委重点合作项目(HZ2021008)

Object Detection Method with Multi-scale Feature Fusion for Remote Sensing Images

ZHANG Yang, XIA Ying   

  1. School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2023-02-06 Revised:2023-03-10 Online:2024-03-15 Published:2024-03-13
  • About author:ZHANG Yang,born in 1997,postgra-duate.His main research interests include remote sensing and object detection.XIA Ying,born in 1972,Ph.D,professor,Ph.D supervisor,is a senior member of CCF(No.10248S).Her main research interests include spatiotemporal big data and cross-media retrieval.
  • Supported by:
    National Natural Science Foundation of China(41871226) and Key Cooperation Projects of Chongqing Municipal Education Commission(HZ2021008).

摘要: 遥感图像目标检测是计算机视觉领域中的一个重要研究方向,广泛运用在军事和民用领域。遥感图像中的目标具有尺度多样、密集排列和类间相似等特点,使得用于自然图像的目标检测方法在遥感图像目标检测中存在较多漏检和误检等现象。针对这一问题,在YOLOv5的基础上,提出一种基于多尺度特征融合的遥感图像目标检测方法。首先,在骨干网中引入融合多头自注意力的残差单元,通过该模块充分提取多层次特征信息,缩小不同尺度间的语义差异;其次,引入融合轻量级上采样算子的特征金字塔网络,用于获取高层语义特征和低层细节特征,通过特征融合的方式获得特征信息更丰富的特征图,从而提升不同尺度目标的特征分辨率。在公开数据集DOTA和NWPU VHR-10上评估了所提方法的有效性,相比基准模型,所提方法的准确率(mAP)分别提高了1.5%和2.0%。

关键词: 遥感图像, 目标检测, 多尺度特征, 特征融合, YOLOv5

Abstract: Object detection for remote sensing images is an important research direction in the field of computer vision,which is widely used in military and civil fields.The objects in remote sensing images have the characteristics of multiple scales,dense arrangement and similarity between classes,so that the object detection methods used in natural images have many omissions and false detection in remote sensing images.To address this problem,this paper proposes an object detection method with multi-scale feature fusion based on YOLOv5 for remote sensing images.Firstly,a residual unit fusing multi-head self-attention is introduced into the backbone network,through which multi-level feature information is fully extracted and semantic differences among diffe-rent scales were reduced.Secondly,a feature pyramid network fusing lightweight upsampling operators is introduced for obtaining high level semantic features and low-level detail ones.And the feature maps with richer feature information could be acquired by feature fusion,which improves the feature resolution of objects at different scales.The performance of the proposed method is evaluated on the datasets DOTA and NWPU VHR-10,and the accuracy(mAP) of the method isimproved by 1.5% and 2.0%,respectively,compared with the baseline model.

Key words: Remote sensing images, Object detection, Multi-scale features, Feature fusion, YOLOv5

中图分类号: 

  • TP753
[1]SUN X,WANG P,YAN Z,et al.FAIR1M:A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery[J].ISPRS Journal of Photogrammetry and Remote Sensing,2022,184:116-130.
[2]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[3]REN S Q,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//Proceedings of the 28th International Conference on Neural Information Processing System.Montréal:MIT Press,2015:91-99.
[4]DAI J,LI Y,HE K,et al.R-FCN:Object Detection via Region-based Fully Convolutional Networks[C]// Advances in Neural Information Processing Systems.Curran Associates Inc.,2016:379-387.
[5]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969.
[6]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[7]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//EuropeanConference on Computer Vision.Cham:Springer,2016:21-37.
[8]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[9]ZHANG L,ZHANG Y S,YU Y,et al.Survey on object detection in tilting box for remote sensing images[J].National Remote Sensing Bulletin,2022,26(9):1723-1743.
[10]ZHU W T,LAN X C,LUO H L,et al.Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN[J].Computer Science,2022,49(S1):378-383.
[11]SHA M M,LI Y,LI A.Multiscale aircraft detection in optical remote sensing imagery based on advanced Faster R-CNN[J].National Remote Sensing Bulletin,2022,26(8):1624-1635.
[12]DENG R Z,CHEN Q H,CHEN Q,et al.A deformable feature pyramid network for ship detection from remote sensing images[J].Acta Geodaetica et CartographicaSinica,2020,49(6):787-797.
[13]YU Y,AI H,HE X J,et al.Attention-based feature pyramid networks for ship detection of optical remote sensing image[J].National Remote Sensing Bulletin,2020,24(2):107-115.
[14]ZHU M C,FENG T,ZHANG Y.Remote sensing image multi-target detection method based on FD-SSD[J].Computer Applications and Software,2019,36(1):238-244.
[15]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[16]JIANG S J,LUO B,HE P,et al.Vehicle Speed Detection by Multi-source Images from UAV[J].Acta Geodaetica et CartographicaSinica,2018,47(9):1228-1237.
[17]YANG X,YAN J,FENG Z,et al.R3det:Refined single-stage detector with feature refinement for rotating object[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:3163-3171.
[18]YANG X,YAN J.Arbitrary-oriented object detection with circular smooth label[C]//European Conference on Computer Vision.Cham:Springer,2020:677-694.
[19]DING J,XUE N,LONG Y,et al.Learning roi transformer for oriented object detection in aerial images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:2849-2858.
[20]WANG J,CHEN Y,GAO M,et al.Improved YOLOv5 network for real-time multi-scale traffic sign detection[J].arXiv:2112.08782,2021.
[21]ZHU X,LYU S,WANG X,et al.TPH-YOLOv5:ImprovedYOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:2778-2788.
[22]GLENN J,ALEX S,JIRKA B:YOLOv5[EB/OL].[2021-04-12].https://github.com/ultralytics/yolov5.
[23]WANG J,CHEN K,XU R,et al.Carafe:Content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3007-3016.
[24]SRINIVAS A,LIN T Y,PARMAR N,et al.Bottleneck transformers for visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:16519-16529.
[25]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[26]XIA G S,BAI X,DING J,et al.DOTA:A large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3974-3983.
[27]CHENG G,ZHOU P,HAN J.Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J].IEEE Transactions on Geoscience and Remote Sensing,2016,54(12):7405-7415.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!