计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240100138-5.doi: 10.11896/jsjkx.240100138

• 图像处理&多媒体技术 • 上一篇    下一篇

基于自注意力与双向特征融合的道路障碍物检测方法

李婷, 赵尔敦, 杨军   

  1. 华中师范大学计算机学院 武汉 430079
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 赵尔敦(erdunz@ccnu.edu.cn)
  • 作者简介:(liting@mails.ccnu.edu.cn)

Road Obstacle Detection Method Based on Self-attention and Bidirectional Feature Fusion

LI Ting, ZHAO Erdun, YANG Jun   

  1. School of Computer Science,Central China Normal University,Wuhan 430079,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:LI Ting,born in 1997,postgraduate.Her main research interests include computer vision,object detection and autopilot.
    ZHAO Erdun,born in 1972,Ph.D,associate professor.His main research interests include computer vision,object detection and autopilot.

摘要: 随着科技的飞速发展,辅助驾驶技术已经成为汽车行业未来发展的重要方向。在基于图像的道路障碍物检测中,现有方法对尺度变化大的目标、小目标和存在遮挡目标的检测能力有限,常出现误判和漏判等问题。针对此问题,提出了一种基于自注意力与双向特征融合的道路障碍物检测方法(CoXt-FCOS)。该方法在主干特征提取网络中引入分组的自注意力机制模块CoXT,以增强网络的全局信息捕获能力;为解决遮挡问题,引入跨阶段金字塔池化模块SPPCSPC;在特征融合模块中,引入路径增强网络,形成双向特征融合模块ESPAFPN,提升网络对小目标的感知能力。实验结果表明,CoXT-FCOS模型的精度较高,在CODA数据集上的mAP达到了88%,能够更准确地检测出道路上的障碍物。

关键词: 障碍物检测, 自动驾驶, FCOS, 自注意力机制, 特征融合

Abstract: With the rapid development of technology,assisted driving technology has become an important direction for the future development of the automotive industry.In image-based road obstacle detection,existing methods have limited detection capabilities for targets with large scale changes,small targets,and targets with occlusion,often resulting in misjudgments and omissions.To address this problem,a road obstacle detection method based on self-attention and bidirectional feature fusion(CoXt-FCOS) is proposed.This method introduces a grouped self-attention mechanism module CoXT in the backbone to enhance the global information capture capabilities of the network.To solve the occlusion problem,a cross-stage pyramid pooling module SPPCSPC is introduced.In the feature fusion module,a path enhancement network is introduced,forming a bidirectional feature fusion module ESPAFPN,to enhance the network's perception of small targets.Experiments show that the CoXT-FCOS model has high accuracy,with an mAP of 88% on the CODA dataset,and can more accurately detect obstacles on the road.

Key words: Obstacle detection, Autopilot, Fully convolutional one-stage object detection, Self-attentio, Feature fusion

中图分类号: 

  • TP391
[1]TIAN Z,SHEN C,CHEN H,et al.Fcos:Fully convolutionalone-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9627-9636.
[2]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[3]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[4]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[5]REDMON J,DIVVALA S,GIRSHICKR,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[6]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[7]CARION N,MASSA F,SYNNAEVEG,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229.
[8]ZHANG H,LI F,LIU S,et al.Dino:Detr with improved denoising anchor boxes for end-to-end object detection[J].arXiv:2203.03605,2022.
[9]LIU S,QI L,QIN H,et al.Path aggregation network for in-stance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8759-8768.
[10]LI Y,YAO T,PANY,et al.Contextual transformer networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):1489-1500.
[11]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[12]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[13]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[14]WANG C Y,LIAO H Y M,WU Y H,et al.CSPNet:A new backbone that can enhance learning capability of CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:390-391.
[15]ZHENG Z,WANG P,LIU W,et al.Distance-IoU loss:Fasterand better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12993-13000.
[16]LI K,CHEN K,WANG H,et al.Coda:A real-world road corner case dataset for object detection in autonomous driving[C]//European Conference on Computer Vision.Cham:Springer Nature Switzerland,2022:406-423.
[17]ZHAO S,CHEN N.Review on Small-scale Pedestrian DetectionTechnology for Complex Pavement[J].Computer Systems and Applications,2022,31(7):1-11.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!