计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231000106-7.doi: 10.11896/jsjkx.231000106

• 图像处理&多媒体技术 • 上一篇    下一篇

基于多层特征融合的行人检测方法研究

黄玲娃, 崔文成, 邵虹   

  1. 沈阳工业大学信息科学与工程学院 沈阳 110870
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 崔文成(624618764@qq.com)
  • 作者简介:(18848972916@163.com)

Study on Pedestrian Detection Method Based on Multi-level Feature Fusion

HUANG Lingwa, CUI Wencheng, SHAO Hong   

  1. School of Information Science and Engineering,Shenyang University of Technology,Shenyang,Liaoning 110870,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:HUNAG Lingwa,born in 1995,postgraduate.Her main research interests include deep learning and image processing.
    CUI Wencheng,born in 1973,master,professor,is a member of CCF(No.314307).His main research interests include collaborative computing,mobile health,medical big data,medical internet of things and medical soft robots.

摘要: 针对遮挡行人检测识别困难、检测精度低,以及漏检率高等问题,在YOLOv7方法的基础上进行结构优化,提出了一种基于多层特征融合的行人检测网络模型,旨在提高遮挡行人检测的准确性。该方法是在主干网络特征提取部分采用ELAN-C模块,以增强行人特征信息的提取能力,从而提高行人检测的准确性。同时,在多尺度特征融合部分引入全局注意力机制构成多层特征融合,通过跨维度的信息交互,特别是对位置信息的关注,增强检测目标特征的表征,提高行人检测的准确性。此外,为了加速模型的收敛速度,采用EIoU作为损失函数,进一步提升检测框的定位精度。在公开数据集CityPresons上进行训练验证,模型对数平均漏检率MR-2下降,Bare,Partial,Reasonable,Heavy分别下降0.55%,0.91%,1.78%,1.68%,有效减少了漏检率。

关键词: YOLOv7, 行人检测, 特征提取, 多尺度融合, 损失函数优化

Abstract: In view of the difficulty,low detection accuracy and high missed detection rate of occluded pedestrian detection,a pedestrian detection network model based on multi-layer feature fusion is proposed based on structural optimization of YOLOv7 method,aiming at improving the accuracy of occluded pedestrian detection.The method is to use ELAN-C module in the feature extraction part of the backbone network to enhance the ability of extracting pedestrian feature information,so as to improve the accuracy of pedestrian detection.At the same time,the global attention mechanism is introduced into the multi-scale feature fusion part to form multi-layer feature fusion.Through inter-dimensional information interaction,especially the focus on location information,the representation of detection target features is enhanced and the accuracy of pedestrian detection is improved.In addition,in order to accelerate the convergence rate of the model,EIoU is used as a loss function to further improve the positioning accuracy of the detection frame.The model is trained and verified on the open data set CityPresons,and the log-average miss rate MR-2 of the evaluation index is decreased Bare 0.55%,Partial 0.91%,Reasonable 1.78%,Heavy 1.68%,respectively,which effectively reduce the miss rate.

Key words: YOLOv7, Pedestrian detection, Feature extraction, Multi-scale fusion, Loss function optimization

中图分类号: 

  • TP391
[1]ZOU Y Q,XIAO Z H,TANG X F,et al.Anchor-free scale adaptive pedestrian detection algorithm[J].Control and Decision.2021,36(2):295-302.
[2]LI C,KASAEI S,HOSSEIN G Y,et al.Deep Learning for Visual Tracking:A Comprehensive Survey[J].IEEETransactions on IntelligentTransportation Systems.2022,23(5):3943-3968.
[3]BI X Y,XU S,WANG Y H.A Review onPedestrian Gait Feature Expression and Recognition[J].Pattern Recognition and Artificial Intelligence.2012,25(1):71-81.
[4]LUO Y,ZHAN Z Y,TIAN Y H,et al.An overview of deep learning based pedestrian detection algorithms[J].Journal of Image and Graphics.2022,27(7):2094-2111.
[5]GIRSHICK R,DARRELL T,MALIK J,et al.Deformable Part Models are Convolutional Neural Networks[C]//Computer Vision and Pattern Recognition(CVPR).2015:437-446.
[6]ZHANG K,XIONG F,HU L,et al.Double Anchor R-CNN for Human Detection in a Crowd[J].Computer Vision and Pattern Recognition,2019,(9):99-98.
[7]LU R Q,MA H M,WANG Y.Semantic Head Enhanced Pedestrian Detection in a Crowd[J].Neurocomputing.2020,(400):343-351.
[8]LI Q Q,ZHUO H,LI H S,et al.Jointly Learning Deep Fea-tures,Deformable Parts,Occlusion and Classification for Pedestrian Detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence.2018,40(8):1874-1887.
[9]HUANG X,GE Z,JIE Z Q,et al.NMS by Representative Region:Towards Crowded Pedestrian Detection by Proposal Pairing[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:10750-10759.
[10]CHU X G,ZHENG A L,SUN L,et al.Detection in CrowdedScenes:One Proposal,Multiple Predictions[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:12214-12223.
[11]CHU J,SU W,ZHOU Z B,et al.Combing Semantics WithMulti-level Feature Fusion for Pedestrian Detection[J].Acta Automatica Sinica.2022,48(1):282-291.
[12]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-Art for real-time object detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2023:7464-7475.
[13]ZHANG X D,ZENG H,GUO S,et al.Efficient Long-Range Attention Network forImage Super-resolution[C]//European Conference on Computer Vision(ECCV).2022:649-667.
[14]TAN M X,PANG R M,LE Q V.EfficientDet:Scalable and Efficient Object Detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:10781-10790.
[15]LIU Y C,SHAO Z R,HOFFMANN N.Global Attention Mechanism:Retain Information toEnhance Channel-SpatialInteractions[J/OL].https://doi.org/10.48550/arXiv.2112.05561.
[16]LIU S,QI L,QIN H F,et al.Path Aggregation Network for Instance Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2018:8759-8768.
[17]WANG L,TAN T,ZHANG Y F,et al.Focal and Efficient IOU Loss for Accurate Bounding Box Regression[J].Neurocomputing.2022,(506):146-157.
[18]JOCHER G.YOLOv5 by Ultralytics[DB/OL].https://github.com/ultralytics/yolov5.
[19]LIU S T,WANG F,LI Z,et al.Yolox:Exceeding Yolo Series in2021[J/OL].https://doi.org/10.48550/arXiv.2107.08430.
[20]WANG X X,WANG G Z,DANG Q Q,et al.PP-YOLOE-R:An Efficient Anchor-Free Rotated ObjectDetector[J/OL].https://doi.org/10.48550/arXiv.2211.02386.
[21]ZHENG G,LI Z M,KIU S T,et al.OTA:Optimal Transport Assignment for Object Detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2021:303-312.
[22]LEE Y,LEE S,BAE Y,et al.An Energy and GPU-ComputationEfficient Backbone Network forReal-Time Object Detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW).2019:752-760.
[23]BOCHKOVSKIYA,WANG C Y,LIAO H.Scaled-YOLOv4:Scaling Cross StagePartial Network[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2021:13024-13033.
[24]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature PyramidNetworks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:936-944.
[25]ZHANG S S,BENENSON R,SCHIEKE B.Schiele,CityPer-sons:A Diverse Dataset for Pedestrian Detection[C]//Computer Vision and Pattern Recognition(CVPR).2017:4457-4465.
[26]WANG X L,XIAO T T,SHAO S,et al.Repulsion Loss:Detecting Pedestrians in a Crowd[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2018:7774-7783.
[27]SONG T,SUN L Y,XIE D,et al.Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation[C]//European Conference on Computer Vision(ECCV).2018:554-569.
[28]ZHANG S F,WEN L Y,XIAO B,et al.Occlusion-aware R-CNN:Detecting Pedestrians in a Crowd[C]//European Conference on Computer Vision(ECCV).2018:657-674.
[29]LIU W,LIAO S,HU W,et al.Learning Efficient Single-stage Pedestrian Detectors by Asymptotic Localization Fitting[J].IEEE Transactions on Image Processing.2018,(29):1413-1425.
[30]LIU W,HASAN L,LIAO S C.Center and Scale Prediction:Anchor-free Approach for Pedestrian and Face Detection[J/OL].https://doi.org/10.48550/arXiv.1904.02948.
[31]LIU M Y,ZHU C,JIANG J,et al.VLPD:Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2023:6662-6671.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!