计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 105-113.doi: 10.11896/jsjkx.211100208
蔡肖1, 陈志华1, 盛斌2
CAI Xiao1, CEHN Zhihua1, SHENG Bin2
摘要: 目标检测任务是计算机视觉领域中基础且备受关注的工作,遥感图像目标检测任务因在交通、军事、农业等方面具有重要应用价值,也成为研究的一大热点。相比自然图像,遥感图像由于受到复杂背景的干扰,以及天气、小型和不规则物体等诸多因素的影响,遥感图像目标检测任务要实现较高的精度是极具挑战性的。文中提出了一种新颖的基于移位窗口Transformer的目标检测网络。模型应用了移位窗口式Transformer模块作为特征提取的骨干,其中,Transformer的自注意力机制对于检测混乱背景下的目标十分有效,移位窗口式的模式则有效避免了大量的平方级复杂度计算。在获得骨干网络提取的特征图之后,模型使用了金字塔架构以融合不同尺度、不同语义的局部和全局特征,有效地减少了特征层之间的信息丢失,并捕捉到固有的多尺度层级关系。此外,文中还提出了自混合视觉转换器模块和跨层视觉转换器模块。自混合视觉转换器模块重新渲染了深层特征图以增强目标特征识别和表达,跨层视觉转换器模块则依据特征上下文交互等级重新排列各特征层像素的信息表达。模块融入到自下而上和自上而下双向特征路径之中,以充分利用包含不同语义的全局和局部信息。所提网络模型在UCAS-AOD数据集和RSOD数据集上进行训练并测试,实验结果表明,模型在遥感图像目标检测任务上效果显著,尤其适用于不规则的目标和小目标类别,如立交桥和汽车。
中图分类号:
[1]HARRIS C G,STEPHENS M.A combined corner and edge detector[C]//Proceedings of the Alvey Vision Conference.Alvey Vision Club,1988:1-6. [2]HARIS K,EFSTRATIADIS S N,MAGLAVERAS N,et al.Hybrid image segmentation using watersheds and fast region merging [J].IEEE Transactions on Image Processing,1998,7(12):1684-1699. [3]YAN Q,XU L,SHI J P,et al.Hierarchical saliency detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2013:1155-1162. [4]CORTES C,VAPNIK V.Support-vector networks [J].Machine Learning,1995,20(3):273-297. [5]VIOLA P A,JONES M J.Rapid object detection using a boosted cascade of simple features[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2001:511-518. [6]ZHANG F,DU B,ZHANG L P,et al.Weakly supervised lear-ning based on coupled convolutional neural networks for aircraft detection [J].IEEE Transactions on Geoscience and Remote Sensing,2016,54(9):5553-5563. [7]CHEN S Q,ZHAN R H,ZHANG J.Geospatial object detection in remote sensing imagery based on multiscale single-shot detector with activated semantics [J].Remote Sensing,2018,10(6):820. [8]YANG X,YANG J R,YAN J C,et al.SCRDet:Towards more robust detection for small,cluttered and rotated objects[C]//Proceedings of the International Conference on Computer Vision.IEEE,2019:8232-8241. [9]YANG X,YAN J C,YANG X K,et al.SCRDet++:Detecting small,cluttered and rotated objects via instance-levelfeature denoising and rotation loss smoothing [J].arXiv:2004.13316,2020. [10]ZOU F H,XIAO W,JI W T,et al.Arbitrary-oriented object detection via dense feature fusion and attention model for remote sensing super-resolution image [J].Neural Computing and Applications,2020,32(18):14549-14562. [11]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:2117-2125. [12]VASWANI A,SHAZEER N,PARMER N,et al.Attention is all you need[C]//Neural Information Processing Systems.2017:5998-6008. [13]ZHU X Z,SU W J,LU L W,et al.Swin Transformer:Hierarchical vision transformer using shifted windows [J].arXiv:2103.14030,2021. [14]GIRSHICK R B,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2014:580-587. [15]REN S Q,HE K M,GIRSHICK R B,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [16]CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving intohigh quality object detection[C]//IEEE Conferenceon Compu-ter Vision and Pattern Recognition.IEEE Computer Society,2018:6154-6162. [17]QIAO S Y,CHEN L C,YUILLE A.DetectoRS:Detecting objects with recursive feature pyramid and switchable atrous convolution[J].arXiv:2006.02334,2020. [18]TAN M X,PANG R,LEQ V.EfficientDet:Scalable and efficient object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2020:10778-10787. [19]REDMON J,DIVVALA S K,GIRSHICK.You Only LookOnce:Unified,Real-Time Object Detection[C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:779-788. [20]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot MultiBox Detector[C]//European Conference on Computer Vision.Springer,2016:21-37. [21]LIN T Y,GOYAL P,GIRSHICK R B,et al.Focal Loss for Dense Object Detection[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2020,42(2):318-327. [22]BOCHOKNOVSKIY A,WANG C Y,LIAO H Y M.YOLOv4:Optimal Speed and Accuracy of Object Detection [J].arXiv:2004.10934,2020. [23]LAW H,DENG J.CornerNet:Detecting objects as paired keypoints[J].International Journal of Computer Vision,2020,128(3):642-656. [24]DUAN K W,BAI S,XIE L X,et al.CenterNet:Keypoint Triplets for Object Detection[C]//IEEE International Conference on Computer Vision.IEEE,2019:6568-6577. [25]ZHOU X Y,KOLTUN V,KRÄHENBÜHL P.Probabilistictwo-stage detection[J].arXiv:2103.07461,2021. [26]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.AnImage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2020. [27]YUAN L,CHEN Y P,W T,et al.Tokens-to-Token ViT:Trai-ning Vision Transformers from Scratch on ImageNet[J].arXiv:2101.11986,2021. [28]WANG W H,XIE E Z,LI X,et al.Pyramid Vision Transfor-mer:A Versatile Backbone for Dense Prediction without Convolutions[J].arXiv:2102.12122,2021. [29]HUANG Z L,BEN Y C,LUO G Z,et al.Shuffle Transformer:Rethinking Spatial Shuffle for Vision Transformer[J].arXiv:2106.03650,2021. [30]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//European Conference on Computer Vision.Springer,2020:213-229. [31]ZHU X Z,SU W J,LU L W,et al.Deformable DETR:Defor-mable Transformers for End-to-End Object Detection[C]//International Conference on Learning Representations.OpenReview,2021. [32]GUO C X,FAN B,ZHANG Q,et al.AugFPN:ImprovingMulti-Scale Feature Learning for Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2020:12592-12601. [33]ZHU H G,CHEN X G,DAI W Q,et al.Orientation Robust Object Detection in Aerial Images Using Deep Convolutional Neural Network[C]//International Conference on Image Proces-sing.IEEE,2015:3735-3739. [34]LONG Y,GONG Y P,XIAO Z F,et al.Accurate object localization in remote sensing images based on convolutional neural networks[J].IEEE Transactions on Geoscience and Remote Sen-sing,2017,55(5):2486-2498. [35]CHEN K,WANG J Q,PANG J M,et al.MMDetection:Open mmlab detection toolbox and benchmark[J].arXiv:1906.07155,2019. [36]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//International Conference on Learning Representations.IEEE Computer Society,2009:248-255. [37]ZHANG H K,CHANG H,MA B P,et al.Dynamic R-CNN:Towards High Quality Object Detection via Dynamic Training[C]//European Conference on Computer Vision.Springer,2020:260-275. [38]PANG J M,CHEN K,SHI J P,et al.Libra R-CNN:Towards Balanced Learning for Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2019:821-830. [39]DAI J F,QI H Z,XIONG Y W,et al.Deformable Convolutional Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:764-773. [40]WANG C Y,BOCHKOVSKIY A,LIAO M H Y.Scaled-YOLOv4:Scaling Cross Stage Partial Network[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2021:13029-13038. [41]WANG C Y,YEH I H,LIAO M H Y.You Only Learn One Representation:Unified Network for Multiple Tasks[J].arXiv:2105.04206,2021. [42]JOCHER G,STOKEN A,CHAURASIA A,et al.Ultralytics/yolov5:v6.0-YOLOv5n ‘Nano' models,Roboflow integration,TensorFlow export,OpenCV DNN support[EB/OL].https://doi.org/10.5281/zenodo.5563715. |
[1] | 张婧媛, 王宏霞, 何沛松. 基于Transformer的多任务图像拼接篡改检测算法 Multitask Transformer-based Network for Image Splicing Manipulation Detection 计算机科学, 2023, 50(1): 114-122. https://doi.org/10.11896/jsjkx.211100269 |
[2] | 王斌, 梁宇栋, 刘哲, 张超, 李德玉. 亮度自调节的无监督图像去雾与低光图像增强算法研究 Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment 计算机科学, 2023, 50(1): 123-130. https://doi.org/10.11896/jsjkx.211100058 |
[3] | 李雪辉, 张拥军, 史殿习, 徐化池, 史燕燕. 融合注意力特征的无锚框视觉目标跟踪方法 AFTM:Anchor-free Object Tracking Method with Attention Features 计算机科学, 2023, 50(1): 138-146. https://doi.org/10.11896/jsjkx.211000083 |
[4] | 赵倩, 周冬明, 杨浩, 王长城. 残差注意力与多特征融合的图像去模糊 Image Deblurring Based on Residual Attention and Multi-feature Fusion 计算机科学, 2023, 50(1): 147-155. https://doi.org/10.11896/jsjkx.211100161 |
[5] | 孙凯丽, 罗旭东, 罗有容. 预训练语言模型的应用综述 Survey of Applications of Pretrained Language Models 计算机科学, 2023, 50(1): 176-184. https://doi.org/10.11896/jsjkx.220800223 |
[6] | 郑诚, 梅亮, 赵伊研, 张苏航. 基于双向注意力机制和门控图卷积网络的文本分类方法 Text Classification Method Based on Bidirectional Attention and Gated Graph Convolutional Networks 计算机科学, 2023, 50(1): 221-228. https://doi.org/10.11896/jsjkx.211100095 |
[7] | 荣欢, 钱敏峰, 马廷淮, 孙圣杰. 基于先验知识图谱的多代理被遮挡目标类别推理模型 Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration 计算机科学, 2023, 50(1): 243-252. https://doi.org/10.11896/jsjkx.220700112 |
[8] | 李小玲, 吴昊天, 周涛, 鲁辉. 一种基于强化学习的口令猜解模型 Password Guessing Model Based on Reinforcement Learning 计算机科学, 2023, 50(1): 334-341. https://doi.org/10.11896/jsjkx.211100001 |
[9] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[10] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[11] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[12] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[13] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[14] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[15] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
|