计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 318-325.doi: 10.11896/jsjkx.250600124

• 计算机图形学&多媒体 • 上一篇    下一篇

基于GraspNet的多阶段无序混装抓取方法

于灵鑫, 陈艺博, 曲浩君, 厉广伟, 李金屏   

  1. 济南大学信息科学与工程学院 济南 250022
    山东省网络环境智能计算技术重点实验室(济南大学) 济南 250022
    山东省“十三五”高校信息处理与认知计算重点实验室(济南大学) 济南 250022
  • 收稿日期:2025-06-19 修回日期:2025-08-19 出版日期:2026-04-15 发布日期:2026-04-08
  • 通讯作者: 李金屏(ise_lijp@ujn.edu.cn)
  • 作者简介:(1379993737@qq.com)
  • 基金资助:
    山东省科技型中小企业创新能力提升工程(2022TSGC1047);中央引导地方科技发展项目(YDZX2024078);济南大学2023年学科交叉会聚建设项目(XKJC-202310)

Multi-stage Grasping Method for Unordered Mixed Objects Grasping Based on GraspNet

YU Lingxin, CHEN Yibo, QU Haojun, LI Guangwei, LI Jinping   

  1. School of Information Science and Engineering, University of Jinan, Jinan 250022, China
    Shandong Provincial Key Laboratory of Network Based Intelligent Computing(University of Jinan), Jinan 250022, China
    Shandong College and University Key Laboratory of Information Processing and Cognitive Computing in 13th Five-year(University of Jinan), Jinan 250022, China
  • Received:2025-06-19 Revised:2025-08-19 Published:2026-04-15 Online:2026-04-08
  • About author:YU Lingxin,born in 2000,postgra-duate.Her main research interests include pattern recognition,computer vision and robot control.
    LI Jinping,born in 1968,Ph.D,professor,is a member of CCF(No.06393S).His main research interests include artificial intelligence,pattern recognition,computer vision,digital image proces-sing and optimization algorithms.
  • Supported by:
    Shandong Provincial Project of Innovation Ability Enhancement Engineering for Technology Oriented Small and Medium-sized Enterprises(2022TSGC1047),Central Guidance Funding Projects for Local Scientific and Technological Development of Shandong Province(YDZX2024078) and University of Jinan Disciplinary Cross-Convergence Construction Project 2023 (XKJC-202310).

摘要: 用于工业分拣领域的机械装置通常是针对特定应用场景和特定产品而设计的,面对多种物品无序堆叠的场景,其普适性和智能性往往较差。当前基于3D结构光相机的点云匹配抓取技术虽在一定程度上提升了柔性生产能力,但受限于硬件成本高昂,以及特征描述能力有限、计算复杂度高、对遮挡敏感等固有缺陷,难以满足无序混装抓取需求。近年来以GraspNet为代表的深度学习抓取技术发展迅速,通过双目相机实现位姿估计,但仍存在目标选择策略欠优、位姿评分机制具有局限性、位姿定位偏差大等问题。针对上述挑战,提出一种改进型三阶段抓取算法。第一阶段,针对目标选择策略欠佳的问题,通过融合YOLOv10目标检测与SAM分割模型,结合优化的目标选择算法,即选择无遮挡、距离近的目标,有效解决了堆叠遮挡场景下的目标选择策略不佳难题。第二阶段,对GraspNet位姿估计框架进行改进,即通过引入基于点云表面法向量的位姿筛选机制,重构更加合理的评分机制,进而获取高精度抓取位姿。第三阶段,设计位姿微调策略,即采用"悬停对齐-垂直抓取"的分层控制架构,最大程度消除执行过程中的累积误差,有效解决位姿定位偏差大、实际抓取不准确问题。实验结果表明,该方法显著提升了复杂场景下的抓取效率、操作可靠性和跨场景泛化能力,同时由于使用双目相机取代了3D结构光相机,还显著降低了系统成本,为工业自动化提供了高性价比的解决方案。

关键词: 无序混装抓取, 位姿估计, 目标选择, 姿态优化, 双目相机

Abstract: Mechanical devices used in industrial sorting are typically designed for specific application scenarios and products,often exhibiting poor versatility and intelligence when faced with unordered mixed object grasping.Current point cloud matching grasping technologies based on 3D structured light cameras have improved flexible production capabilities to a certain extent.How-ever,they are constrained by high hardware costs,limited feature description capabilities,high computational complexity,and sensitivity to occlusions,making it difficult to meet the demands of unordered mixed object grasping.In recent years,deep learning-based grasping technologies,represented by GraspNet,have developed rapidly,achieving pose estimation through binocular ca-meras.Nevertheless,these methods still suffer from suboptimal target selection strategies,limitations in pose scoring mechanisms,and significant pose localization errors.To address these challenges,this study proposes an improved three-stage grasping algorithm.In the first stage,the YOLOv10 object detection model is fused with the SAM segmentation model,combined with an optimized target selection algorithm that prioritizes unobstructed and closer targets,effectively solving the problem of poor target selection strategies in stacked and occluded scenarios.In the second stage,the GraspNet pose estimation framework is enhanced by introducing a pose filtering mechanism based on point cloud surface normals and reconstructing the scoring mechanism to obtain high-precision grasping poses.In the third stage,a pose fine-tuning strategy is designed using a hierarchical control architecture of “hover alignment-vertical grasping” to effectively eliminate cumulative errors during execution,ultimately addressing the issue of inaccurate real-world grasping.Experimental results demonstrate that this method significantly improves grasping efficiency,operational reliability,and cross-scenario generalization capabilities in complex environments.Moreover,by replacing 3D structured light cameras with binocular cameras,the system cost is significantly reduced,providing a cost-effective solution for industrial automation.

Key words: Unordered mixed objects grasping, Pose estimation, Target selection, Pose optimization, Binocular camera

中图分类号: 

  • TP242
[1]GUO H K.Application of Artificial Intelligence Technology in Mechanical Automation[J].Electronic Technology,2024,53(10):218-219.
[2]ZHAO Y,HUANG Q.Application of Intelligent Sensors in Industrial Automation[J].Smart China,2025(1):126-128.
[3]YAN J X.Research on Robotic Sorting Technology for Stacked Parts Based on Deep Learning[D].Hangzhou:Zhejiang University,2024.
[4]ZHANG H J,XIONG Z,LAO D B,et al.Monocular visionmeasurement system based on EPNP algorithm[J].Infrared and Laser Engineering,2019,48(5):0517005.
[5]LOWE D G.Distinctive image features from scale invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[6]RUBLEE E,RABAUD V,KONOLIGE K,et al.ORB:an effi-cient alternative to SIFT or SURF[C]//2011 International Conference on Computer Vision.New York:IEEE,2011:2564 2571.
[7]DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.New York:IEEE,2005:886-893.
[8]ZHANG Q P,CAO Y.Research on three-dimensional recon-struction algorithm of weak textured objects in indoor scenes[J].Laser & Optoelectronics Progress,2021,58(8):0810017.
[9]BESL P J,MCKAY N D.Method for registration of 3-D shapes[C]//Proceedings of SPIE.1992:586-606.
[10]TOMBARI F,SALTI S,DI STEFANO L.Unique signatures of histograms for local surface description[C]//Computer Vision-ECCV 2010.Heidelberg:Springer,2010:356-369.
[11]RUSU R B,BLODOW N,BEETZ M.Fast point feature histograms(FPFH) for 3D registration[C]//2009 IEEE Interna-tional Conference on Robotics and Automation.New York:IEEE,2009:3212-3217.
[12]JOHNSON A E.Spin-images:a representation for 3-D surface matching:CMU-RI-TR-97-47[R].Pittsburgh:Carnegie Mellon University,1997.
[13]JIANG Y,MOSESON S,SAXENA A.Efficient grasping from rgbd images:Learning using a new rectangle representation[C]//2011 IEEE International Conference on Robotics and Automation.IEEE,2011:3304-3311.
[14]DEPIERRE A,DELLANDRÉA E,CHEN L.Jacquard:A large scale dataset for robotic grasp detection.[C]//RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:3511-3516.
[15]XIANG Y,SCHMIDT T,NARAYANAN V,et al.Posecnn:A convolutional neural network for 6d object pose estimation in cluttered scenes[J].arXiv:1711.00199,2017.
[16]FANG H S,WANG C,GOU M,et al.Graspnet-1billion:Alarge-scalebenchmark for general object grasping[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11444-11453.
[17]KLEEBERGER K,BORMANN R,KRAUS W,et al.A survey on learning-based robotic grasping[J].Current Robotics Reports,2020,1:239-249.
[18]QI C R,YI L,SU H,et al.Pointnet++:Deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems.2017.
[19]ZHOU Y,TUZEL O.Voxelnet:End-to-end learning for pointcloud based 3d object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4490-4499.
[20]PENG S,LIU Y,HUANG Q,et al.Pvnet:Pixel-wise votingnetwork for 6dof pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4561-4570.
[21]JIANG P,ERGU D,LIU F,et al.A Review of Yolo algorithm developments[J].Procedia Computer Science,2022,199:1066-1073.
[22]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[23]WANG A,CHEN H,LIU L,et al.Yolov10:Real-time end-to-end object detection[J].Advances in Neural Information Processing Systems,2024,37:107984-108011.
[24]KIRILLOV A,MINTUN E,RAVI N,et al.Segment anything[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4015-4026.
[25]FISCHLER M A,BOLLES R C.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM,1981,24(6):381-395.
[1] 郭楠, 李婧源, 任曦.
基于深度学习的刚体位姿估计方法综述
Survey of Rigid Object Pose Estimation Algorithms Based on Deep Learning
计算机科学, 2023, 50(2): 178-189. https://doi.org/10.11896/jsjkx.211200164
[2] 廖德, 张辉, 赵晨阳.
基于大型场景下的多相机标定方法
Multi-camera Calibration Method Based on Large-scale Scene
计算机科学, 2022, 49(11A): 211200054-6. https://doi.org/10.11896/jsjkx.211200054
[3] 朱世昕, 杨泽民.
基于半直接方法的序列影像直线特征跟踪匹配算法
Line Tracking and Matching Algorithm Based on Semi-direct Method in Image Sequence
计算机科学, 2019, 46(6A): 270-273.
[4] 张国亮,吴琰翔,王展妮,王田.
基于视觉标记的增强现实系统建模及配准误差问题研究
Research on Augmented Reality System Modeling and Registration Error Based on Simple Visual Marker
计算机科学, 2015, 42(6): 299-302. https://doi.org/10.11896/j.issn.1002-137X.2015.06.063
[5] 方志刚 马卫娟.
多通道用户界面中的目标选择技术

计算机科学, 2000, 27(1): 48-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!