基于多粒度的Transformer目标检测算法

doi:10.11896/jsjkx.230600028

Abstract

Abstract: Different from other scale objects,small objects have the characteristics of carrying less semantic information and a small number of training samples.Therefore,the current object detection algorithm has the problem of low detection accuracy for small objects.Aiming at this problem,a Transformer object detection algorithm based on multi-granularity is proposed.Firstly,adopting the multi-granularity idea,a new Transformer serialization method is designed to predict the object position granularly from coarse to fine,thereby improving the object location effect of the model.Then,based on the three-way decision idea,fine-grained mining of small object samples and regular-scale object samples increases the number of small object samples and hardnegative samples.Finally,experimental results on the COCO dataset show that,the small object detection average accuracy(APs) of the algorithm reaches 31.5%,and the mean average accuracy(mAP) reaches 49.1%.Compared with the baseline model,the APs is improved by 1.4% and the mAP is improved by 2.2%.The algorithm effectively improves the detection effect of small objects and significantly improves the overall accuracy of object detection.

Key words: Small object detection, Multi-granularity, Three-way decision, Transformer, Deep learning

CLC Number:

TP389.1

XU Fang, MIAO Duoqian, ZHANG Hongyun. Transformer Object Detection Algorithm Based on Multi-granularity[J].Computer Science, 2023, 50(11): 143-150.

References

[1]VASWANI A,SHAZEERN,PARMAR N,et al.Attention Is All You Need[C]//Advances in Neural Information Processing Systems.Curran,2017:5998-6008.
[2]WANG Z Y,MIAO D Q,ZHAO C R,et al.A Pedestrian Tra-cking Algorithm Based on Multi-Granularity Feature[J].Journal of Computer Research and Development,2020,57(5):996-1002.
[3]CHEN Y F,MIAO D Q.Granular Regression with A Gradient Descent Method[J].Information Sciences,2020,537:247-260.
[4]QIAN J,LIU C H,MIAO D Q,et al.Sequential Three-way Decisions via Multi-granularity[J].Information Sciences,2020,507:606-629.
[5]YUE X D,CHEN Y F,MIAO D Q,et al.Fuzzy Neighborhood Covering for Three-way Classification[J].Information Sciences,2020,507:795-808.
[6]LANGG M,MIAOD Q,HAMIDO F.Three-way Group Conflict Analysis Based on Pythagorean Fuzzy Set Theory[J].IEEE Transactions on Fuzzy Systems,2020,28(3):447-461.
[7]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImagenetClassification with Deep Convolutional Neural Networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105.
[8]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2014:1714-1722.
[9]GIRSHICK R.Fast R-CNN[C]//International Conference on Computer Vision(ICCV).Cham:Springer,2015:1440-1448.
[10]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[11]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]//International Conference on Computer Vision(ICCV).Cham:Springer,2017:2980-2988.
[12]CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving into High-Quality Object Detection[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2018:6154-6162.
[13]WEI L,DRAGOMIR A,DUMITRU E,et al.SSD:Single Shot MultiBox Detector[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2016:21-37.
[14]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Conference on Computer Vision(ICCV).Cham:Springer,2017:2980-2988.
[15]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2017:6517-6525.
[16]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767,2018.
[17]BOCHKOVSKIY A,WANG C,LIAO H.YOLOv4:OptimalSpeed and Accuracy of Object Detection[J].arXiv:2004.10934,2020.
[18]WANG C,BOCHKOVSKIY A,LIAO H.YOLOv7:Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[J].arXiv:2207.02696,2022.
[19]ZHANG W L,CHEN X H.SSD Object Detection Algorithmwith Cross-layer Fusion and Receptive Field Amplification[J].Computer Science,2023,50(3):231-237.
[20]JIA T H,PENG L.SSD Object Detection Algorithm with Resi-dual Learning and Cyclic Attention[J].Computer Science,2023,50(5):170-176.
[21]LAW H,DENG J.CornerNet:Detecting Objects as Paired Keypoints[C]//European Conference on ComputerVision(ECCV).Cham:Springer,2018.
[22]TIAN Z,SHEN C H,CHENH,et al.FCOS:Fully Convolutional One-Stage Object Detection[C]//International Conference on Computer Vision(ICCV).NJ:IEEE,2019:9627-9636.
[23]DUAN K W,BAI S,XIE L X,et al.CenterNet:Keypoint Triplets for Object Detection[C]//IEEE/CVF International Confe-rence on Computer Vision(ICCV).Cham:Springer,2019:1-16.
[24]ZHOU X Y,ZHUO J C,PHILIPP K.Bottom-Up Object Detection by Grouping Extreme and Center Points[C]//Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2019:850-859.
[25]LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021:10012-10022.
[26]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2020.
[27]WANG W H,XIE E Z,LI X,et al.PyramidVision Transformer:A Versatile Backbone for Dense Prediction without Convolutions[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021:13977-13986.
[28]NICOLAS C,FRANCISCO M,GABRIEL S,et al.End-to-endObject Detection with Transformers[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2020:586-603.
[29]ZHU X Z,SU W J,LU L W,et al.Deformable DETR:Defor-mable Transformers for End-to-End Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2021.
[30]LIU S L,LI F,ZHANG H,et al.DAB-DETR:Dynamic Anchor Boxes are Better Queries for DETR[J].arXiv:2201.12329,2022.
[31]HE K,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Computer Vision and Pattern Re-cognition(CVPR).NJ:IEEE,2016:770-778.
[32]LIN T Y,DOLLÁR P,GIRSHICK R B,et al.Feature Pyramid Networks for Object Detection[C]//Computer Vision and Pattern Recognition(CVPR).NJ:IEEE,2017:2117-2125.
[33]ZHANG Z,QI H,LIU S,et al.CIoU:Enhancing Convolutional Neural Networks for Object Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).NJ:IEEE,2019:7155-7163.
[34]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common Objects in Context[C]//European Conference on Computer Vision(ECCV).Cham:Springer,2014:740-755.
[35]DENG J,WEI D,SOCHER R,et al.ImageNet:A Large-scaleHierarchical Image Database[C]//Computer Vision and Pattern Recognition.Florida(CVPR),NJ:IEEE,2009:248-255.

Related Articles 15

[1]	ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[2]	LI Haiming, ZHU Zhiheng, LIU Lei, GUO Chenkai. Multi-task Graph-embedding Deep Prediction Model for Mobile App Rating Recommendation [J]. Computer Science, 2023, 50(9): 160-167.
[3]	HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[4]	ZHU Ye, HAO Yingguang, WANG Hongyu. Deep Learning Based Salient Object Detection in Infrared Video [J]. Computer Science, 2023, 50(9): 227-234.
[5]	ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[6]	SONG Xinyang, YAN Zhiyuan, SUN Muyi, DAI Linlin, LI Qi, SUN Zhenan. Review of Talking Face Generation [J]. Computer Science, 2023, 50(8): 68-78.
[7]	WANG Xu, WU Yanxia, ZHANG Xue, HONG Ruize, LI Guangsheng. Survey of Rotating Object Detection Research in Computer Vision [J]. Computer Science, 2023, 50(8): 79-92.
[8]	ZHOU Ziyi, XIONG Hailing. Image Captioning Optimization Strategy Based on Deep Learning [J]. Computer Science, 2023, 50(8): 99-110.
[9]	TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[10]	ZHANG Xiao, DONG Hongbin. Lightweight Multi-view Stereo Integrating Coarse Cost Volume and Bilateral Grid [J]. Computer Science, 2023, 50(8): 125-132.
[11]	WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[12]	LI Kun, GUO Wei, ZHANG Fan, DU Jiayu, YANG Meiyue. Adversarial Malware Generation Method Based on Genetic Algorithm [J]. Computer Science, 2023, 50(7): 325-331.
[13]	WANG Mingxia, XIONG Yun. Disease Diagnosis Prediction Algorithm Based on Contrastive Learning [J]. Computer Science, 2023, 50(7): 46-52.
[14]	SHEN Zhehui, WANG Kailai, KONG Xiangjie. Exploring Station Spatio-Temporal Mobility Pattern:A Short and Long-term Traffic Prediction Framework [J]. Computer Science, 2023, 50(7): 98-106.
[15]	HUO Weile, JING Tao, REN Shuang. Review of 3D Object Detection for Autonomous Driving [J]. Computer Science, 2023, 50(7): 107-118.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Transformer Object Detection Algorithm Based on Multi-granularity

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0