计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 174-181.doi: 10.11896/jsjkx.231000009

• 计算机图形学&多媒体 • 上一篇    下一篇

面向自动驾驶的高精度实时语义分割算法架构

耿焕同1,2,3, 李嘉兴1, 蒋骏1, 刘振宇1, 范子辰4   

  1. 1 南京信息工程大学计算机学院 南京 210044
    2 中国气象局雷达气象重点开放实验室 南京 210044
    3 江苏开放大学信息工程学院 南京 210036
    4 南京信息工程大学软件学院 南京 210044
  • 收稿日期:2023-10-07 修回日期:2024-04-17 出版日期:2024-11-15 发布日期:2024-11-06
  • 通讯作者: 耿焕同(htgeng@nuist.edu.cn)
  • 基金资助:
    国家自然科学基金(42375145);中国气象局雷达气象重点开放实验室(2023LRM-A02)

High-precision Real-time Semantic Segmentation Algorithm Architecture for Autonomous Driving

GENG Huantong1,2,3, LI Jiaxing1, JIANG Jun1, LIU Zhenyu1, FAN Zichen4   

  1. 1 School of Computer Science,Nanjing University of Information Science & Technology,Nanjing 210044,China
    2 China Meteorological Administration Radar Meteorology Key Laboratory,Nanjing 210044,China
    3 School of Information Technology,Jiangsu Open University,Nanjing 210036,China
    4 School of Software,Nanjing University of Information Science & Technology,Nanjing 210044,China
  • Received:2023-10-07 Revised:2024-04-17 Online:2024-11-15 Published:2024-11-06
  • About author:GENG Huantong,born in 1973,professor,Ph.D supervisor,is a senior member of CCF(No.12356S).His main research interests include multi-objective optimization and deep learning.
  • Supported by:
    National Natural Science Foundation of China(42375145) and Open Grants of China Meteorological Administration Radar Meteorology Key Laboratory(2023LRM-A02).

摘要: PID(Proportion Integration Differentiation)语义分割架构缓解了双边架构中细节特征容易被周围的上下文信息淹没的问题(超调),同时取得了优越的性能。然而,该架构中高分辨率的边界分支严重影响了推理速度。针对此问题,提出了基于空间注意力机制和轻量辅助语义分支构建的高效PID架构。其中,轻量注意力融合模块用于提取精确的上下文信息并指导不同特征信息的融合,快速聚合金字塔池化模块能够快速聚合多种尺度的语义信息,并设计了一种结合Canny边缘检测算子的深监督训练策略以增强训练效果。与基线相比,所提模型以较小的时延代价换取了6%的精度提升,并且在Cityscapes,CamVid和KITTI数据集上取得了准确性和速度的良好平衡,精度超越了现有同一速度区间的模型。其中,所提模型在Cityscapes测试集上以120.9 frames/s的帧率达到了78.5%的精度。

关键词: 实时语义分割, 自动驾驶, 超调, 空间注意力机制, 边缘检测

Abstract: The proportional integration differentiation(PID) semantic segmentation architecture mitigates the problem of overshooting in the dual-branch architecture,where fine-grained features are easily overwhelmed by surrounding contextual information.However,the high-resolution boundary branch in this architecture significantly impacts the inference speed.To address this issue,an efficient PID architecture based on spatial attention mechanisms and a lightweight auxiliary semantic branch is proposed.The designed lightweight attention fusion module is used to extract precise contextual information and guide the fusion of various feature information.Additionally,a fast aggregation pyramid pooling module is introduced to rapidly aggregate semantic information across multiple scales.Finally,a deep supervision training strategy,combined with the canny edge detection operator,is designed to enhance the training effectiveness.In comparison to the baseline,the proposed model achieves a 6% increase in accuracy at the cost of a slightly increased latency.It strikes a good balance between accuracy and speed on the Cityscapes,CamVid,and KITTI datasets,outperforming existing models in the same speed range.Notably,the model achieves an accuracy of 78.5% at 120.9 frames/s on the Cityscapes test set.

Key words: Real-time semantic segmentation, Autonomous driving, Overshoot, Spatial attention mechanism, Edge detection

中图分类号: 

  • TP391
[1] FENG D,HAASE-SCHÜTZ C,ROSENBAUM L,et al.Deepmulti-modal object detection and semantic segmentation for autonomous driving:Datasets,methods,and challenges[J].IEEE Transactions on Intelligent Transportation Systems,2020,22(3):1341-1360.
[2] ASGARI T S,ABHISHEK K,COHEN J P,et al.Deep semantic segmentation of natural and medical images:a review[J].Artificial Intelligence Review,2021,54:137-178.
[3] YUAN X,SHI J,GU L.A review of deep learning methods for semantic segmentation of remote sensing imagery[J].Expert Systems with Applications,2021,169:114417.
[4] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[5] CHEN Q S,ZHANG Y,PU L,et al.Multi-path Semantic Segmentation Based on Edge Optimization and Global Modeling[J].Computer Science,2023,50(S1):2207137.
[6] WANG Y,ZHOU Q,LIU J,et al.Lednet:A lightweight encoder-decoder network for real-time semantic segmentation[C]//2019 IEEE International Conference on Image Processing(ICIP).IEEE,2019:1860-1864.
[7] LI X,YOU A,ZHU Z,et al.Semantic flow for fast and accurate scene parsing[C]//Computer Vision-ECCV 2020:16th Euro-pean Conference,Glasgow,UK,August 23-28,2020,Procee-dings,Part I 16.Springer International Publishing,2020:775-793.
[8] XU J,XIONG Z,BHATTACHARYYA S P.PIDNet:A Real-Time Semantic Segmentation Network Inspired by PID Controllers[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2023:19529-19539.
[9] BEZDEK J C.A convergence theorem for the fuzzy ISODATAclustering algorithms[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1980(1):1-8.
[10] EMARA T,MUNIM H E,ABBAS H M,et al.LiteSeg:A Novel Lightweight ConvNet for Semantic Segmentation[J].arXiv:1912.06683,2019.
[11] NIRKIN Y,WOLF L,HASSNER T.Hyperseg:Patchwise hy-pernetwork for real-time semantic segmentation[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4061-4070.
[12] RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015:18th International Conference,Munich,Germany,October 5-9,2015,Proceedings,Part III 18.Springer Interna-tional Publishing,2015:234-241.
[13] FAN M,LAI S,HUANG J,et al.Rethinking bisenet for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:9716-9725.
[14] PENG J,LIU Y,TANG S,et al.Pp-liteseg:A superior real-time semantic segmentation model[J].arXiv:2204.02681,2022.
[15] YU C,WANG J,PENG C,et al.Bisenet:Bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:325-341.
[16] YU C,GAO C,WANG J,et al.Bisenet v2:Bilateral networkwith guided aggregation for real-time semantic segmentation[J].International Journal of Computer Vision,2021,129(11):3051-3068.
[17] HONG Y D,PAN H H,SUN W C,et al.Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes[J].arXiv:2101.06085,2021.
[18] HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[19] LI H,XIONG P,FAN H,et al.Dfanet:Deep feature aggregation for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:9522-9531.
[20] SONG Q,MEI K,HUANG R.AttaNet:Attention-augmentednetwork for fast and accurate scene parsing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:2567-2575.
[21] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[22] JOCHER G.YOLOv5 by Ultralytics(Version 7.0)[EB/OL].https://doi.org/10.5281/zenodo.3908559.
[23] TAKIKAWA T,ACUNA D,JAMPANI V,et al.Gated-scnn:Gated shape cnns for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:5229-5238.
[24] CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[25] BROSTOW G J,FAUQUEUR J,CIPOLLA R.Semantic object classes in video:A high-definition ground truth database[J].Pattern Recognition Letters,2009,30(2):88-97.
[26] ABU ALHAIJA H,MUSTIKOVELA S K,MESCHEDER L,et al.Augmented reality meets computer vision:Efficient data generation for urban driving scenes[J].International Journal of Computer Vision,2018,126:961-972.
[27] RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252.
[28] GENG H,JIANG J,SHEN J,et al.Cascading Alignment forUnsupervised Domain-Adaptive DETR with Improved DeNoi-sing Anchor Boxes[J].Sensors,2022,22(24):9629.
[29] GU Y H,HAO J,CHEN B.Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion[J].Computer Science,2023,50(S1):22050001-6.
[30] CHEN L,XU G,FU N N,et al.Research on 3D Point Cloud Semantic Segmentation Method Fused with Edge Detection[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2022,39(5):1-9.
[31] ORSIC M,KRESO I,BEVANDIC P,et al.In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12607-12616.
[32] CHEN W,GONG X,LIU X,et al.Fasterseg:Searching for faster real-time semantic segmentation[J].arXiv:1912.10917,2019.
[33] WANG Y,CHEN S,BIAN H,et al.Deep Multi-Resolution Net-work for Real-Time Semantic Segmentation in Street Scenes[C]//2023 International Joint Conference on Neural Networks(IJCNN).IEEE,2023:1-8.
[34] KUMAAR S,LYU Y,NEX F,et al.Cabinet:Efficient context aggregation network for low-latency semantic segmentation[C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:13517-13524.
[35] SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training re-gion-based object detectors with online hard example mining[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:761-769.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!