计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 162-171.doi: 10.11896/jsjkx.230300113
简英杰, 杨文霞, 方玺, 韩欢
JIAN Yingjie, YANG Wenxia, FANG Xi, HAN Huan
摘要: 点云数据的高度稀疏特性使当前大部分基于点云的三维目标检测算法对点云的局部特征学习不足,且点云数据包含的部分无效信息会干扰目标检测。针对以上问题,提出了一种基于边卷积与瓶颈注意力的三维目标检测模型。首先,构建多层边卷积(Edge Convolution,EdgeConv),针对点云中的每个点,通过寻找特征空间上与其最接近的K个点,以构建K-近邻图结构,并学习点云的多尺度局部特征;其次,设计适用于三维点云数据的瓶颈注意力模块(Bottleneck Attention Module,BAM),每个BAM包括一个通道注意力模块和一个空间注意力模块,用于增强对目标检测有价值的点云信息,提升网络模型的表征能力。网络以VoteNet为基线,多层边卷积和BAM模块依次加入PointNet++网络和投票模块之间。模型在SUN RGB-D和ScanNetV2公共数据集上进行实验,并与13个当前先进的三维目标检测算法进行对比。实验结果表明,对于SUN RGB-D数据集,所提模型在交并比(Intersection over Union,IoU)为0.5时的平均精确率mAP@0.5达到了最高,并在床、椅子、办公桌等6个对象类别(共10个类别)达到最优准确率(AP@0.25);对于ScanNetV2数据集,模型的mAP@0.25和mAP@0.5均达到最优,并在椅子、沙发、照片等10个对象类别(共18个类别)达到了最优准确率(AP@0.25)。与基线VoteNet相比,所提模型在两个数据集上的mAP@0.25分别提升了6.5%和12.9%,消融实验证明了所加入的边卷积模块和瓶颈注意力模块的有效性。
中图分类号:
[1]CHE A B,ZHANG H,LI C,et al.Single-stage 3D Object Detection Method Based on Point Cloud Data in Traffic Environment[J].Computer Science,2022,49(S2):567-572. [2]SHEN Q,CHEN Y L,LIU S,et al.A Two-level Network-based Algorithm for 3D Object Detection[J].Computer Science,2020,47(10):145-150. [3]GUO Y F,WU D H,WEI Q M.A Review of Point Cloud-based 3D Object Detection Methods Based on Deep Learning[J].Computer Application Research,2023,40(1):20-27. [4]QI C R,SU H,MO K,et al.PointNet:Deep Learning on Point Sets for 3D Classification and Segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:652-660. [5]QI C R,SU H,NIEßNER M,et al.Volumetric and Multi-View CNNs for Object Classification on 3D Data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:5648-5656. [6]SU H,MAJI S,KALOGERAKIS E,et al.Multi-view Convolutional Neural Networks for 3D Shape Recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press 2015:945-953. [7]ZHOU Y,TUZEL O.VoxelNet:End-to-End Learning for Point Cloud Based 3D Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:4490-4499. [8]WU Z,SONG S,KHOSLA A,et al.3D ShapeNets:A Deep Representation for Volumetric Shapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:1912-1920. [9]QI C R,YI L,SU H,et al.PointNet++:Deep HierarchicalFeature Learning on Point Sets in a Metric Space[C]//Confe-rence and Workshop on Neural Information Processing Systems.Cambridge:MIT Press,2017:5099-5108. [10]SHI S,WANG X,LI H.PointRCNN:3D Object Proposal Ge-neration and Detection from Point Cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:770-779. [11]QI C R,LITANY O,HE K,et al.Deep Hough Voting for 3D Object Detection in Point Clouds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2019:9277-9286. [12]QI C R,CHEN X,LITANY O,et al.ImVoteNet:Boosting 3D Object Detection in Point Clouds with Image Votes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:4404-4413. [13]CHENG B,SHENG L,SHI S,et al.Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:8963-8972. [14]ZHANG Z,SUN B,YANG H,et al.H3DNet:3D Object Detection Using Hybrid Geometric Primitives[C]//Computer Vision-ECCV 2020:16th European Conference.Berlin:Springer Press,2020:311-329. [15]WANG H,SHI S,YANG Z,et al.RBGNet:Ray-based Grouping for 3D Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:1110-1119. [16]LIU Z,ZHANG Z,CAO Y,et al.Group-Free 3D Object Detection via Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:2949-2958. [17]ZHENG Y,DUAN Y,LU J,et al.HyperDet3D:Learning aScene-conditioned 3D Object Detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:5585-5594. [18]WANG Y,SUN Y,LIU Z,et al.Dynamic Graph CNN forLearning on Point Clouds[J].ACM Transactions on Graphics(ToG),2019,38(5):1-12. [19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is All You Need[C]//Conference and Workshop on Neural Information Processing Systems.Cambridge:MIT Press,2017:5998-6008. [20]HU J,SHEN L,SUN G.Squeeze-and-Excitation Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:7132-7141. [21]QIN Z,ZHANG P,WU F,et al.FcaNet:Frequency Channel Attention Networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:783-792. [22]JADERBERG M,SIMONYAN K,Zisserman A.Spatial Transformer Networks[C]//Conference and Workshop on Neural Information Processing Systems.Cambridge:MIT Press,2015:2017-2025. [23]CHU X,TIAN Z,WANG Y,et al.Twins:Revisiting the Design of Spatial Attention in Vision Transformers[C]//Conference and Workshop on Neural Information Processing Systems.Cambridge:MIT Press 2021:9355-9366. [24]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//Proceedings of the European Conference on Computer Vision(ECCV).Berlin:Springer Press,2018:3-19. [25]PARK J,WOO S,LEE J Y,et al.BAM:Bottleneck AttentionModule[C]//British Machine Vision Conference 2018.Newcastle:BMVA Press,2018:147-161. [26]SONG S,LICHTENBERG S P,XIAO J.SUN RGB-D:A RGBD Scene Understanding Benchmark Suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:567-576. [27]DAI A,CHANG A X,SAVVA M,et al.ScanNet:Richly-annotated 3D Reconstructions of Indoor Scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5828-5839. [28]QI C R,LIU W,WU C,et al.Frustum PointNets for 3D Object Detection from RGB-D Data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:918-927. [29]MISRA I,GIRDHAR R,JOULIN A.An End-to-End Trans-former Model for 3D Object Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:2906-2917. [30]XIE Q,LAI Y K,WU J,et al.VENet:Voting EnhancementNetwork for 3D Object Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:3712-3721. [31]WANG Y,CHEN X,CAO L,et al.Multimodal Token Fusionfor Vision Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:12186-12195. [32]XIE Q,LAI Y K,WU J,et al.MLCVNet:Multi-Level Context VoteNet for 3D Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:10447-10456. [33]PAN X,XIA Z,SONG S,et al.3D Object Detection with Pointformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:7463-7472. [34]TAO B,YAN F W,YIN Z S,et al.3D Object Detection Based on High-precision Map Enhancement[J].Journal of Jilin University(Engineering and Technology Edition),2023,53(3):802-809. |
|