计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220400164-7.doi: 10.11896/jsjkx.220400164

• 人工智能 • 上一篇    下一篇

基于多种强调机制的深度点云网络改进研究

刘慧, 田帅华   

  1. 北京建筑大学电气与信息工程学院 北京 100044
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 刘慧(liuhui@bucea.edu.cn)
  • 基金资助:
    国家自然科学基金(62176018)

Study on Improvement of Deep Point Cloud Network Based on Multiple Emphasis Mechanisms

LIU Hui, TIAN Shuaihua   

  1. School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:LIU Hui,born in 1978,Ph.D,associate professor.Her main research interests include indoor 3D positioning,TomoSAR 3D reconstruction and radar signal processing,3D point cloud classification and microwave remote sensing image processing.
  • Supported by:
    National Natural Science Foundation of China(62176018).

摘要: 机器视觉是机器人从复杂空间环境中识别工作对象的关键技术。在机器人系统中常用的Kinect深度相机或激光扫描传感器能够获取目标的三维信息,这使得机器人完成更加复杂的如组装、拆卸、抓取等工作任务成为可能。但是,这也对机器人系统处理三维信息的能力如三维定位、工作对象尺寸测量、估计等提出更高要求。以PointNet网络为基础,分析了软阈值挤压激励、通道门控、注意力等机制的主要特征强调机理,分别采用软阈值挤压激励、通道门控、注意力网络对PointNet网络进行改进,并在斯坦福大学公开的ShapeNet数据集上进行实验验证。结果表明,3种强调机制对原网络的改进,使三维点云的分割精度(均交并比)较PointNet原网络分别提高了0.24%,0.68%,0.93%。该改进方法为后续解决机器人在组装、拆卸、抓取等任务中对工作对象的尺寸精确估计奠定了基础。

关键词: 机器视觉, 三维点云, 挤压激励, 通道门控, 注意力模块

Abstract: Machine vision is a key technology for robots to identify working objects from complex spatial environments.Kinect depth cameras or laser scanning sensors commonly used in robotic systems are capable of acquiring three-dimensional information about the target,which makes it possible for robots to perform more complex work tasks such as assembly,disassembly,and grasping.However,this also places higher demands on the robot system’s ability to process 3D information such as 3D localization,work object size measurement,and estimation.We analyze the main feature emphasis mechanisms of soft threshold squeeze-and-excitation,channel-wise gated,and attention mechanisms based on PointNet networks,and improve PointNet networks by using soft threshold squeeze-and-excitation,channel-wise gated,and attention networks,respectively,and experimentally validate them on the publicly available ShapeNet dataset from Stanford University.Experimental results show that the improvement of original network by the three emphasis mechanisms improves segmentation accuracy(mean intersection and merge ratio) of 3D point clouds by 0.24%,0.68%,and 0.93%,respectively,in comparison with original PointNet network.The improved method lays foundation for the subsequent solution of accurate estimation for the size of working objects in tasks such as assembly,disassembly and grasping by robots.

Key words: Machine vision, 3D point cloud, Squeeze-and-excitation, Channel-wise gated, Attention module

中图分类号: 

  • TP391
[1]XU X,MCGORRY R W.The validity of the first and secondgeneration Microsoft Kinect for identifying joint center locations during static postures[J].Appl. Ergon., 2015,49:47-54.
[2]ZHOU Y,YU Z,XU X D,et al.Practice research of classroom teaching system based on Kinect[C]//15th Int.Conf.Comput.Sci.Educ(ICCSE 2020).2020:572-575.
[3]CUNHA A,PÁDUA L,COSTA L,et al.Evaluation of MS Kinect for Elderly Meal Intake Monitoring[C]//Procedia Tech-nol.2014:1383-1390.
[4]CARUSO L,RUSSO R,SAVINO S.Microsoft Kinect V2 vision system in a manufacturing application[J].Robot.Comput.Integr.Manuf.,2017,48:174-181.
[5]BIERMANN H,PHILIPSEN R,BRELL T,et al.Users’ Expectations,Fears,and Attributions Regarding Autonomous Driving-A Comparison of Traffic Scenarios[M].Springer International Publishing,2021.
[6]JAWAID I,QURESHI J K.Advancements in medical imagingthrough Kinect:A review[C]//2017 Int.Symp.Wirel.Syst.Networks(ISWSN 2017).2017:1-5.
[7]FERNANDES A O,MOREIRA L F E,MATA J M.Machine vision applications and development aspects[C]//IEEE Int.Conf.Control Autom(ICCA).2011:1274-1278.
[8]ALOIMONOS J,WEISS I,BANDYOPADHYAY A.Active vision[J].Int.J.Comput.Vis.,1988,1(4):333-356.
[9]KIM P,CHEN J,CHO Y K.SLAM-driven robotic mapping and registration of 3D point clouds[J]. Autom.Constr.,2018,89:38-48.
[10]DÖNMEZ E,KOCAMAZ A F,DIRIK M.A Vision-Based Real-Time Mobile Robot Controller Design Based on Gaussian Function for Indoor Environment[J].Arab.J.Sci.Eng.,2018,43(12):7127-7142.
[11]KHAIRUDIN M,CHEN G D,WU M C,et al.Control of a movable robot head using vision-based object tracking[J].Int.J.Electr.Comput.Eng.,2019,9(4):2503-2512.
[12]KUZNETSOVA A,MALEVA T,SOLOVIEV V.UsingYOLOv3 algorithm with pre-And post-processing for apple detection in fruit-harvesting robot[J].Agronomy,2020,10(7).
[13]ZHENG F,FANG F,MA X.Trajectory Sampling and Fitting Restoration Based on Machine Vision for Robot Fast Teaching[C]//Proc.15th IEEE Conf.Ind.Electron.Appl.(ICIEA 2020).2020:604-609.
[14]TANG B,JIANG L.Binocular stereovision omnidirectional motion handling robot[J].Int.J.Adv.Robot.Syst.,2020,17(3):1-11.
[15]LI Y,LIU Y.Vision-based Obstacle Avoidance Algorithm forMobile Robot[C]//Proc.-2020 Chinese Autom.Congr.(CAC 2020).2020:1273-1278.
[16]CHAUDHURY A.Machine Vision System for 3D Plant Phenotyping[J].IEEE/ACM Trans.Comput.Biol.Bioinforma.,2018,16(6):2009-2022.
[17]CHERAGHIAN A,RAHMAN S,PETERSSON L.Zero-shot learning of 3d point cloud objects[C]//Proc.16th Int.Conf.Mach.Vis.Appl.(MVA 2019).2019.
[18]MAHDAOUI A.3D point cloud simplification based on the clustering algorithm and introducing the Shannon’s entropy[C]//Thirteenth International Conference on Machine Vision.SPIE,2021,11605:174-182.
[19]LIANG J G,CHEN M L,MA H.Registration of Terrestrial Laser Scanning Data Based on Projection Distribution Entropy[J].Laser & Optoelectronics Progress,2019,56(13):131501.
[20]LAN W H,LI N,TONG Q.Improved3-D Point Cloud Registration Algorithm with Oriented Bounding Box[J].Computer Engineering and Applications,2022,58(14):177-184.
[21]CHANG A X,FUNKHOUSER T,GUIBAS L,et al.Shapenet:An information-rich 3d model repository[J].arXiv:1512.03012,2015.
[22]GUO Y,WANG H,HU Q,et al.Deep learning for 3d point clouds:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,43(12):4338-4364.
[23]QI C R,SU H,MO K,et al.PointNet:Deep learning on point sets for 3D classification and segmentation[C]//Proc.30th IEEE Conf.Comput.Vis.Pattern Recognition(CVPR 2017).2017:77-85.
[24]HU J.Squeeze-and-Excitation_Networks_CVPR_2018_paper.pdf[C]//CVPR.2018:7132-7141.
[25]WOO S,PARK J,LEE J,et al.CBAM:Convolutional Block Attention Module[C]//ECCV.2018:3-19.
[26]LI X,WU X,LU H,et al.Channel-wise gated res2net:Towards robust detection of synthetic speech attacks[J].arXiv:2107.08803,2021.
[27]TOLSTIKHIN I O,HOULSBY N,KOLESNIKOV A,et al.Mlp-mixer:An all-mlp architecture for vision[J].Advances in Neural Information Processing Systems,2021,34:24261-24272.
[28]LIU W,WEN Y,YU Z,et al.Large-margin softmax loss for convolutional neural networks[J].arXiv:1612.02295, 2016.
[29]ZHAO M,ZHONG S,FU X,et al.Deep Residual ShrinkageNetworks for Fault Diagnosis[J].IEEE Trans.Ind.Informati-cs,2020,16(7):4681-4690.
[30]PENG Y H.De-noising by modified soft-thresholding[J].IEEE Asia-Pacific Conf.Circuits Syst.,2000,41(3):760-762.
[31]LIN M,CHEN Q,YAN S.Network in network(2nd)[C]//Int.Conf.Learn.Represent.ICLR 2014-Conf.Track Proc.2014:1-10.
[32]SALTZER J H,REED D P,CLARK D D.End-to-end arguments in system design[J].ACM Trans.Comput.Syst.,1984,2(4):277-288.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!