计算机科学 ›› 2024, Vol. 51 ›› Issue (5): 162-171.doi: 10.11896/jsjkx.230300113

• 计算机图形学&多媒体 • 上一篇    下一篇

基于边卷积与瓶颈注意力的点云三维目标检测

简英杰, 杨文霞, 方玺, 韩欢   

  1. 武汉理工大学理学院 武汉 430070
  • 收稿日期:2023-03-13 修回日期:2023-10-21 出版日期:2024-05-15 发布日期:2024-05-08
  • 通讯作者: 杨文霞(wenxiayang@whut.edu.cn)
  • 作者简介:(jianyingjiejie@163.com)
  • 基金资助:
    国家重点研发计划(2020YFA0714200);国家自然科学基金(11901443)

3D Object Detection Based on Edge Convolution and Bottleneck Attention Module for Point Cloud

JIAN Yingjie, YANG Wenxia, FANG Xi, HAN Huan   

  1. School of Science,Wuhan University of Technology,Wuhan 430070,China
  • Received:2023-03-13 Revised:2023-10-21 Online:2024-05-15 Published:2024-05-08
  • About author:JIAN Yingjie,born in 1998,postgra-duate.His main research interests include 2D and 3D object detection and so on.
    YANG Wenxia,born in 1978,Ph.D,associate professor.Her main research interests include image and video proces-sing and so on.
  • Supported by:
    National Key R & D Program of China(2020YFA0714200) and National Natural Science Foundation of China(11901443).

摘要: 点云数据的高度稀疏特性使当前大部分基于点云的三维目标检测算法对点云的局部特征学习不足,且点云数据包含的部分无效信息会干扰目标检测。针对以上问题,提出了一种基于边卷积与瓶颈注意力的三维目标检测模型。首先,构建多层边卷积(Edge Convolution,EdgeConv),针对点云中的每个点,通过寻找特征空间上与其最接近的K个点,以构建K-近邻图结构,并学习点云的多尺度局部特征;其次,设计适用于三维点云数据的瓶颈注意力模块(Bottleneck Attention Module,BAM),每个BAM包括一个通道注意力模块和一个空间注意力模块,用于增强对目标检测有价值的点云信息,提升网络模型的表征能力。网络以VoteNet为基线,多层边卷积和BAM模块依次加入PointNet++网络和投票模块之间。模型在SUN RGB-D和ScanNetV2公共数据集上进行实验,并与13个当前先进的三维目标检测算法进行对比。实验结果表明,对于SUN RGB-D数据集,所提模型在交并比(Intersection over Union,IoU)为0.5时的平均精确率mAP@0.5达到了最高,并在床、椅子、办公桌等6个对象类别(共10个类别)达到最优准确率(AP@0.25);对于ScanNetV2数据集,模型的mAP@0.25和mAP@0.5均达到最优,并在椅子、沙发、照片等10个对象类别(共18个类别)达到了最优准确率(AP@0.25)。与基线VoteNet相比,所提模型在两个数据集上的mAP@0.25分别提升了6.5%和12.9%,消融实验证明了所加入的边卷积模块和瓶颈注意力模块的有效性。

关键词: 三维目标检测, 点云, 边卷积, 瓶颈注意力模块, VoteNet, SUN RGB-D数据集, ScanNetV2 数据集

Abstract: Due to the highly sparsity of point cloud data,current 3D object detection methods based on point cloud are inadequate for learning local features,and some invalid information contained in point cloud data can interfere with object detection.To address the above problems,a 3D object detection model based on edge convolution(EdgeConv) and bottleneck attention module(BAM) is proposed.First,by creating a K-nearest-neighbor graph structure for each point in point clouds on the feature space,multilayer edge convolutions are constructed to learn the multi-scale local features of point clouds.Second,a bottleneck attention module(BAM) is designed for 3D point cloud data,and each BAM consists of a channel attention module and a spatial attention module to enhance the point cloud information that is valuable for object detection,aiming to strengthen the feature representation of the proposed model.The network uses VoteNet as the baseline,and multilayer edge convolutions and BAM are added sequentially between PointNet++ and the voting module.The proposed model is evaluated and compared with other 13 state-of-the-art methods on two benchmark datasets SUN RGB-D and ScanNetV2.Experimental results demonstrate that on SUN RGB-D dataset,the proposed model achieves the highest mAP@0.5,and the highest AP@0.25 for six out of ten categories such as bed,chair and desk.On ScanNetV2 dataset,this model outperforms other 13 methods in terms of mAP under both IoU 0.25 and 0.5,and achieves the highest AP@0.25 for ten out of eighteen categories such as chair,sofa and picture.As compared to the baseline VoteNet,the mAP@0.25 of the proposed model improves by 6.5% and 12.9% respectively on two datasets.Ablation studies are conducted to verify the contributions of each component.

Key words: 3D object detection, Point clouds, Edge convolution, Bottleneck attention module, VoteNet, SUN RGB-D dataset, ScanNetV2 dataset

中图分类号: 

  • TP183
[1]CHE A B,ZHANG H,LI C,et al.Single-stage 3D Object Detection Method Based on Point Cloud Data in Traffic Environment[J].Computer Science,2022,49(S2):567-572.
[2]SHEN Q,CHEN Y L,LIU S,et al.A Two-level Network-based Algorithm for 3D Object Detection[J].Computer Science,2020,47(10):145-150.
[3]GUO Y F,WU D H,WEI Q M.A Review of Point Cloud-based 3D Object Detection Methods Based on Deep Learning[J].Computer Application Research,2023,40(1):20-27.
[4]QI C R,SU H,MO K,et al.PointNet:Deep Learning on Point Sets for 3D Classification and Segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:652-660.
[5]QI C R,SU H,NIEßNER M,et al.Volumetric and Multi-View CNNs for Object Classification on 3D Data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:5648-5656.
[6]SU H,MAJI S,KALOGERAKIS E,et al.Multi-view Convolutional Neural Networks for 3D Shape Recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE Press 2015:945-953.
[7]ZHOU Y,TUZEL O.VoxelNet:End-to-End Learning for Point Cloud Based 3D Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:4490-4499.
[8]WU Z,SONG S,KHOSLA A,et al.3D ShapeNets:A Deep Representation for Volumetric Shapes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:1912-1920.
[9]QI C R,YI L,SU H,et al.PointNet++:Deep HierarchicalFeature Learning on Point Sets in a Metric Space[C]//Confe-rence and Workshop on Neural Information Processing Systems.Cambridge:MIT Press,2017:5099-5108.
[10]SHI S,WANG X,LI H.PointRCNN:3D Object Proposal Ge-neration and Detection from Point Cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:770-779.
[11]QI C R,LITANY O,HE K,et al.Deep Hough Voting for 3D Object Detection in Point Clouds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2019:9277-9286.
[12]QI C R,CHEN X,LITANY O,et al.ImVoteNet:Boosting 3D Object Detection in Point Clouds with Image Votes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:4404-4413.
[13]CHENG B,SHENG L,SHI S,et al.Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:8963-8972.
[14]ZHANG Z,SUN B,YANG H,et al.H3DNet:3D Object Detection Using Hybrid Geometric Primitives[C]//Computer Vision-ECCV 2020:16th European Conference.Berlin:Springer Press,2020:311-329.
[15]WANG H,SHI S,YANG Z,et al.RBGNet:Ray-based Grouping for 3D Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:1110-1119.
[16]LIU Z,ZHANG Z,CAO Y,et al.Group-Free 3D Object Detection via Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:2949-2958.
[17]ZHENG Y,DUAN Y,LU J,et al.HyperDet3D:Learning aScene-conditioned 3D Object Detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:5585-5594.
[18]WANG Y,SUN Y,LIU Z,et al.Dynamic Graph CNN forLearning on Point Clouds[J].ACM Transactions on Graphics(ToG),2019,38(5):1-12.
[19]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is All You Need[C]//Conference and Workshop on Neural Information Processing Systems.Cambridge:MIT Press,2017:5998-6008.
[20]HU J,SHEN L,SUN G.Squeeze-and-Excitation Networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:7132-7141.
[21]QIN Z,ZHANG P,WU F,et al.FcaNet:Frequency Channel Attention Networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:783-792.
[22]JADERBERG M,SIMONYAN K,Zisserman A.Spatial Transformer Networks[C]//Conference and Workshop on Neural Information Processing Systems.Cambridge:MIT Press,2015:2017-2025.
[23]CHU X,TIAN Z,WANG Y,et al.Twins:Revisiting the Design of Spatial Attention in Vision Transformers[C]//Conference and Workshop on Neural Information Processing Systems.Cambridge:MIT Press 2021:9355-9366.
[24]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//Proceedings of the European Conference on Computer Vision(ECCV).Berlin:Springer Press,2018:3-19.
[25]PARK J,WOO S,LEE J Y,et al.BAM:Bottleneck AttentionModule[C]//British Machine Vision Conference 2018.Newcastle:BMVA Press,2018:147-161.
[26]SONG S,LICHTENBERG S P,XIAO J.SUN RGB-D:A RGBD Scene Understanding Benchmark Suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:567-576.
[27]DAI A,CHANG A X,SAVVA M,et al.ScanNet:Richly-annotated 3D Reconstructions of Indoor Scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5828-5839.
[28]QI C R,LIU W,WU C,et al.Frustum PointNets for 3D Object Detection from RGB-D Data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2018:918-927.
[29]MISRA I,GIRDHAR R,JOULIN A.An End-to-End Trans-former Model for 3D Object Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:2906-2917.
[30]XIE Q,LAI Y K,WU J,et al.VENet:Voting EnhancementNetwork for 3D Object Detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE Press,2021:3712-3721.
[31]WANG Y,CHEN X,CAO L,et al.Multimodal Token Fusionfor Vision Transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2022:12186-12195.
[32]XIE Q,LAI Y K,WU J,et al.MLCVNet:Multi-Level Context VoteNet for 3D Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:10447-10456.
[33]PAN X,XIA Z,SONG S,et al.3D Object Detection with Pointformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:7463-7472.
[34]TAO B,YAN F W,YIN Z S,et al.3D Object Detection Based on High-precision Map Enhancement[J].Journal of Jilin University(Engineering and Technology Edition),2023,53(3):802-809.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!