计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 264-271.doi: 10.11896/jsjkx.230300222

• 计算机图形学&多媒体 • 上一篇    下一篇

基于改进Swin Transformer的中心点目标检测算法

刘家森, 黄俊   

  1. 重庆邮电大学通信与信息工程学院 重庆 400065
  • 收稿日期:2023-03-29 修回日期:2023-08-25 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 黄俊(huangjun@cqupt.edu.cn)
  • 作者简介:(2352556955@qq.com)
  • 基金资助:
    国家自然科学基金(61771085)

Center Point Target Detection Algorithm Based on Improved Swin Transformer

LIU Jiasen, HUANG Jun   

  1. School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2023-03-29 Revised:2023-08-25 Online:2024-06-15 Published:2024-06-05
  • About author:LIU Jiasen,born in 1999,postgraduate.His main research interests include object detection and deep learning.
    HUANG Jun,born in 1971,Ph.D,professor,master supervisor.His main research interests include object detection and deep learning.
  • Supported by:
    National Natural Science Foundation of China(61771085).

摘要: 针对Swin Transformer在提取局部特征信息和特征表达能力上存在的不足,提出了一种基于改进Swin Transformer的中心点目标检测算法,以提高其在目标检测方面的性能。通过调整网络结构和引入反卷积模块来增强网络对局部特征信息的提取能力,利用自适应二维高斯核和回归头模块检测目标中心点来增强特征表达能力,并在Swin Transformer block模块中加入dropout激活函数,以缓解网络过拟合问题。在Pascal VOC和MS COCO 2017数据集上分别对改进后的算法进行验证,实验结果表明,改进后的Swin Transformer算法在Pascal VOC数据集上的精确度达到了81.1%,在MS COCO数据集上的精确度达到了37.2%,明显优于其他主流目标检测算法。

关键词: 深度学习, 图像处理, 目标检测, 反卷积, Swin Transformer

Abstract: Aiming at the shortcomings of Swin Transformer in extracting local feature information and expressing features,this paper proposes a center point target detection algorithm based on improved Swin Transformer to improve its performance in target detection.By adjusting the network structure and introducing a deconvolution module to enhance the network’s ability to extract local feature information,using an adaptive two-dimensional Gaussian kernel and a regression head module to detect the center point of the target,so as to enhance the feature expression ability,and adding a dropout activation function to the Swin Transformer block module to alleviate the network overfitting problem.The improved algorithm is validated on the Pascal VOC and MS COCO 2017 datasets,respectively.The experimental results show that the improved Swin Transformer algorithm achieves an accuracy of 81.1% on the Pascal VOC dataset and 37.2% on the MS COCO dataset,significantly superior to other mainstream object detection algorithms.

Key words: Deep learning, Image processing, Object detection, Deconv, Swin Transformer

中图分类号: 

  • TP391
[1]CHEN K Q,ZHU Z L,DENG X M,et al.Deep learning for Multi-Scale Object Detection:A survery[J].Journal of Software,2021,32(4):1201-1227.
[2]BAO S M,WANG S Q.Overview of Object Detection Algorithms Based on Deep Learning[J].Transducer and Microsystem Technologies,2022,41(4):5-9.
[3]HAN C,GAO G,ZHANG Y.Real time small traffic sign detection with revised faster-RCNN[J].Multimedia Tools and Applications,2018,7(10):13263-13278.
[4]REDMON J,FARHADI A.YOLOv3:An inc-remental improvement[J].arXiv:1804.02767,2018.
[5]LI X J,DENG Y M,CHENG Z H,et al.Improved YOLOv5 algorithm for airport runway foreign object detection[J].Computer Engineering and Applications,2023,59(2):202-211.
[6]TIAN Z,SHEN C H,CHEN H,et al.FCOS fully convolutionalone-stage object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington USA,2019:9626-9635.
[7]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[J].arXiv:1706.03762,2017.
[8]DOSOVITSKIY A,BEYER L,KOESNIKOV A,et al.An Image is Worth 16×16 Words:Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations.Online:ICLR,2021:3-7.
[9]LIU Z,LIN Y T,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE International Conference on Computer Vision. Montreal.Canada,2021:11-18.
[10]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional Single Shot Detector[J].arXiv:1701.06659,2017.
[11]ZHOU Y,LIU Y,LU J,et al.DIT:A Deformation InvariantTransformer Network for Unsupervised Keypoint Discovery and Description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2020:12630-12639.
[12]ZHOU X,WANG D,KRÄHENBÜL P.Objects as Points[J].arXiv:1904.07850,2019.
[13]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207.0580,2012.
[14]WANG C,LIU Y J,XIE Q,et al.Anchor Free object detection algorithm based on soft labeland sample weight optimization[J].Computer Science,2022,49(8):157-164.
[15]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//Proceedings of Conference on Computer Vision.Berlin,Germany,2014:740-755.
[16]LIU W,ANGUELOV D,ERHAND,et al.SSD:Single shotmultibox detector[C]//Computer Vision-SCCV 2016.Amsterdam,2016:21-37.
[17]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Proceedings of Conference on Computer Vision.Venice,2017:2980-2988.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!