计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221200152-8.doi: 10.11896/jsjkx.221200152

• 图像处理&多媒体技术 • 上一篇    下一篇

物体区域信息引导下的RGB-D场景3D目标检测

缪永伟1,2, 单丰2, 杜思澄3, 王金荣1, 张旭东4   

  1. 1 杭州师范大学信息科学与技术学院 杭州 311121
    2 浙江理工大学计算机科学与技术学院 杭州 310018
    3 伦敦国王学院自然科学学院 伦敦N1C4BQ
    4 浙江树人学院信息科技学院 杭州 310015
  • 发布日期:2023-11-09
  • 通讯作者: 张旭东(xdzhang@zjsru.edu.cn)
  • 作者简介:(ywmiao@hznu.edu.cn)
  • 基金资助:
    国家自然科学基金(61972458);浙江省自然科学基金(LZ23F020002);浙江省公益应用研究项目(LGF22F020006)

Object Region Guided 3D Target Detection in RGB-D Scenes

MIAO Yongwei1,2, SHAN Feng2, DU Sicheng3, WANG Jinrong1, ZHANG Xudong4   

  1. 1 School of Information Science and Technology,Hangzhou Normal University,Hangzhou 311121,China
    2 School of Computer Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China
    3 School of Natural Sciences,King's College London,London N1C4BQ
    4 School of Information Science and Technology,Zhejiang Shuren University,Hangzhou 310015,China
  • Published:2023-11-09
  • About author:ZHANG Xudong,born in 1982,Ph.D,associate professor.His main research interests include computer graphics and computer vision.
  • Supported by:
    National Natural Science Foundation of China(61972458),Natural Science Foundation of Zhejiang Province,China(LZ23F020002)and Zhejiang Public Welfare Application Research Project(LGF22F020006).

摘要: 针对室内场景RGB-D数据的3D目标检测是图形学与三维视觉中的重要问题。针对RGB-D场景中3D目标检测对复杂背景的适应性较差、目标检测中难以有效利用物体区域信息及场景点云特征信息等缺陷,基于物体区域信息引导,提出一种融合全局和局部点云特征并排除背景干扰的3D目标检测框架。该框架以场景RGB-D数据作为输入,首先提取彩色图像中待检测目标对象2D区域并为对象进行粗分类,再将对象区域二维边界框提升到三维斜锥体区域并转化形成点云数据;然后在斜锥体点云上利用物体区域分类信息进行特征提取,并利用特征变换与最大池聚合操作将点云全局特征和局部特征有效融合;接着利用融合特征以预测各采样点与前景背景相关程度的概率分数,依据此概率分数分割场景前景点与背景点,并通过场景背景点剔除以形成屏蔽性点云;最终在屏蔽性点云中投票产生物体中心点并借助物体区域信息提出建议及3D目标预测,此外,还加入了一个角点损失,对边界框精度进行优化。针对SUN RGB-D数据集进行网络训练,实验结果表明,与传统方法相比,所提框架的目标检测结果准确率得到有效提升,同一评估指标下的点云目标检测准确率达到59.1%,并且在强遮挡或稀疏采样点区域下亦能够精确估计三维物体的边界框。

关键词: 3D目标检测, 前景点云提取, 点云分割, RGB-D, 区域信息

Abstract: 3D object detection for RGB-D scenes is an important issue in the literature of computer graphics and 3d vision.To overcome the poor adaptability to complex background of RGB-D scenes and it is hard to effectively combine the object region information and intrinsic feature of sampling points,a novel object region guided 3d detection framework is proposed,which can combine the global and local features of sampling points and also eliminate the background interference.Our framework takes the RGB-D data of 3Dscenes as input.First,the 2D regions of different objects in the underlying RGB image are be extracted and roughly be classified.These 2D boundary boxes of different objects can thus be lifted to their corresponding 3D oblique cone regions,and the RGB-D data located in the cone regions can also be converted to point cloud data.Furthermore,guided by the object region information,its feature of the sampling points located in each oblique cone can be extracted,and the global and local features of the sampling points are effectively fused by feature transformation and maximum pool aggregation operation.Moreover,these fused feature can be adopted to predict the probability score which reflect its correlation between each sampling point located in the foreground or background regions.According to this probability score,the sampling points of foreground and background regions can be segmented and the masked point cloud is thus generated by dividing the background sampling points from the underlying 3D scenes.Finally,the center point of the object is generated by voting in the shielded point cloud,and suggestions and 3D target prediction are made with the aid of object area information.In addition,a corner loss is added to optimize the accuracy of the bounding box.Using the public SUN RGB-D dataset,experimental results show that our proposed framework is effectively on 3D object detection.The accuracy rate of point cloud target detection under the same evaluation index reaches 59.1% if compared with the traditional method,and the boundary boxes of 3d objects can also be accurately estimated for different areas even with strong occlusion or sparse sampling points.

Key words: 3D object detection, Foreground point cloud extraction, Point cloud segmentation, RGB-D, Regional information

中图分类号: 

  • TP391
[1]ARNOLD E,AL-JARRAH O Y,DIANATI M,et al.A survey on 3d object detection methods for autonomous driving applications [J].IEEE Transactions on Intelligent Transportation Systems,2019,20(10):3782-3795.
[2]CUI Q,SUN H,YANG F.Learning dynamic relationships for 3d human motion prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2020:6519-6527.
[3]CHENG B,SHENG L,SHI S,et al.Back-tracing representative points for voting-based 3D object detection in point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2021:8963-8972.
[4]YANG W K,YUAN X P,CHEN X F,et al.Multi feature segmentation of 3D lidar point cloud space [J].Computer Science,2022,49(8):143-149.
[5]DENG Z,LATECKI L J.Amodal detection of 3d objects:Inferring 3d bounding boxes from 2d ones in RGB-depth images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2017:5762-5770.
[6]HOU J,DAI A,NIEβNER M.3D-SIS:3D semantic instancesegmentation of RGB-D scans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2019:4416-4425.
[7]LI J,WONG H C,LO S L,et al.Multiple object detection by a deformable part-based model and an R-CNN [J].IEEE Signal Processing Letters,2018,25(2):288-292.
[8]PENG C,MA J.Semantic segmentation using stride spatial py-ramid pooling and dual attention decoder [J].Pattern Recognition,2020,107(1):182-196.
[9]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[10]RU C,WANG F,LI T,et al.Outline viewpoint feature histo-gram:An improved point cloud descriptor for recognition and grasping of workpieces [J].Review of Scientific Instruments,2021,92(2):1095-1101.
[11]LI Y,LI Q,HUANG Q,et al.Spatiotemporal interest point detector exploiting appearance and motion-variation information [J].Journal of Electronic Imaging,2019,28(3):348-361.
[12]DIETRICH P I,BLAICHER M,REUTER I,et al.In situ 3Dnanoprinting of free-form coupling elements for hybrid photonic integration [J].Nature Photonics,2018,12(4):241-247.
[13]AO S,GUO Y,GU S,et al.SGHs for 3D local surface description [J].IET Computer Vision,2020,14(4):154-161.
[14]WANG C,LIU Y J,XIE Q,et al.Anchor free target detection algorithm based on soft label and sample weight optimization [J].Computer Science,2022,49(8):157-164.
[15]CHEN Y,HAO Y G,WANG H Y,et al.A dynamic programming pre detection tracking algorithm based on local gradient intensity map [J].Computer Science,2022,49(8):150-156.
[16]LEE C,MOON J H.Robust lane detection and tracking for real-time applications [J].IEEE Transactions on Intelligent Transportation Systems,2018,19(12):4043-4048.
[17]DOUMA A,SENGUL G,SALEM F,et al.Applying the histogram of oriented gradients to recognize arabic letters[C]//IEEE 1st International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering MI-STA.IEEE,2021:350-355.
[18]LI G,YU Y.Contrast-oriented deep neural networks for salient object detection [J].IEEE Transactions on Neural Networks & Learning Systems,2018,29(12):6038-6051.
[19]CHEN M,YU L,ZHI C,et al.Improved faster R-CNN for fabric defect detection based on Gabor filter with genetic algorithm optimization [J].Computers in Industry,2022,134(1):207-214.
[20]LUGO G,HAJARI N,REDDY A,et al.Textureless object recognition using an RGB-D sensor[C]//Proceedings of International Conference on Smart Multimedia.Cham:Springer,2019:13-27.
[21]LI F,JIN W,FAN C,et al.PSANet:Pyramid splitting and aggregation network for 3d object detection in point cloud [J].Sensors,2020,21(1):136-149.
[22]YAN D,LI G,LI X,et al.An improved faster R-CNN method to detect tailings ponds from high-resolution remote sensing images [J].Remote Sensing,2021,13(11):2052-2063.
[23]QI C R,LITANY O,HE K,et al.Deep hough voting for 3d object detection in point clouds[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2019:9277-9286.
[24]CHENG B,SHENG L,SHI S,et al.Back-tracing representative points for voting-based 3d object detection in point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8963-8972.
[25]QI C R,LIU W,WU C,et al.Frustum pointnets for 3d object detection from rgbd data[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Rrecognition.2018:918-927.
[26]WANG Z,JIA K.Frustum convnet:Sliding frustums to aggre-gate local point-wise features for amodal 3d object detection[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2019:1742-1749.
[27]SONG S R,LICHTENBERG S P,XIAO J X.SUN RGB-D:a rgb-d scene understanding benchmark suite[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2015:567-576.
[28]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[C]//Proceedings of Advances in Neural Information Processing Systems.2015:2017-2025.
[29]KOSSAIFI J,BULAT A,TZIMIROPOULOS G,et al.T-Net:Parametrizing fully convolutional nets with a single high-order tensor[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2019:7822-7831.
[30]QI C R,SU H,MO K,et al.PointNet:deep learning on point set for 3d classification and segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2017:77-85.
[31]KENDALL A,GAL Y,CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2018:7482-7491.
[32]SONG S,XIAO J.Deep sliding shapes for amodal 3d object detection in rgb-d images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2016:808-816.
[33]REN Z,SUDDERTH E B.Three-dimensional object detectionand layout prediction using clouds of oriented gradients[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society Press,2016:1525-1533.
[34]LAHOUD J,GHANEM B.2d-driven 3d object detection in rgb-d images[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Los Alamitos:IEEE Computer Society Press,2017:4622-4630.
[35]SHEN X,STAMOS I.Frustum VoxNet for 3D object detection from RGB-D or Depth images[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2020:1698-1706.
[36]ZHANG Z,SUN B,YANG H,et al.H3DNet:3d object detection using hybrid geometric primitives[C]//Proceedings of European Conference on Computer Vision.Cham:Springer,2020:311-329.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!