Computer Science ›› 2026, Vol. 53 ›› Issue (4): 318-325.doi: 10.11896/jsjkx.250600124

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Multi-stage Grasping Method for Unordered Mixed Objects Grasping Based on GraspNet

YU Lingxin, CHEN Yibo, QU Haojun, LI Guangwei, LI Jinping   

  1. School of Information Science and Engineering, University of Jinan, Jinan 250022, China
    Shandong Provincial Key Laboratory of Network Based Intelligent Computing(University of Jinan), Jinan 250022, China
    Shandong College and University Key Laboratory of Information Processing and Cognitive Computing in 13th Five-year(University of Jinan), Jinan 250022, China
  • Received:2025-06-19 Revised:2025-08-19 Online:2026-04-15 Published:2026-04-08
  • About author:YU Lingxin,born in 2000,postgra-duate.Her main research interests include pattern recognition,computer vision and robot control.
    LI Jinping,born in 1968,Ph.D,professor,is a member of CCF(No.06393S).His main research interests include artificial intelligence,pattern recognition,computer vision,digital image proces-sing and optimization algorithms.
  • Supported by:
    Shandong Provincial Project of Innovation Ability Enhancement Engineering for Technology Oriented Small and Medium-sized Enterprises(2022TSGC1047),Central Guidance Funding Projects for Local Scientific and Technological Development of Shandong Province(YDZX2024078) and University of Jinan Disciplinary Cross-Convergence Construction Project 2023 (XKJC-202310).

Abstract: Mechanical devices used in industrial sorting are typically designed for specific application scenarios and products,often exhibiting poor versatility and intelligence when faced with unordered mixed object grasping.Current point cloud matching grasping technologies based on 3D structured light cameras have improved flexible production capabilities to a certain extent.How-ever,they are constrained by high hardware costs,limited feature description capabilities,high computational complexity,and sensitivity to occlusions,making it difficult to meet the demands of unordered mixed object grasping.In recent years,deep learning-based grasping technologies,represented by GraspNet,have developed rapidly,achieving pose estimation through binocular ca-meras.Nevertheless,these methods still suffer from suboptimal target selection strategies,limitations in pose scoring mechanisms,and significant pose localization errors.To address these challenges,this study proposes an improved three-stage grasping algorithm.In the first stage,the YOLOv10 object detection model is fused with the SAM segmentation model,combined with an optimized target selection algorithm that prioritizes unobstructed and closer targets,effectively solving the problem of poor target selection strategies in stacked and occluded scenarios.In the second stage,the GraspNet pose estimation framework is enhanced by introducing a pose filtering mechanism based on point cloud surface normals and reconstructing the scoring mechanism to obtain high-precision grasping poses.In the third stage,a pose fine-tuning strategy is designed using a hierarchical control architecture of “hover alignment-vertical grasping” to effectively eliminate cumulative errors during execution,ultimately addressing the issue of inaccurate real-world grasping.Experimental results demonstrate that this method significantly improves grasping efficiency,operational reliability,and cross-scenario generalization capabilities in complex environments.Moreover,by replacing 3D structured light cameras with binocular cameras,the system cost is significantly reduced,providing a cost-effective solution for industrial automation.

Key words: Unordered mixed objects grasping, Pose estimation, Target selection, Pose optimization, Binocular camera

CLC Number: 

  • TP242
[1]GUO H K.Application of Artificial Intelligence Technology in Mechanical Automation[J].Electronic Technology,2024,53(10):218-219.
[2]ZHAO Y,HUANG Q.Application of Intelligent Sensors in Industrial Automation[J].Smart China,2025(1):126-128.
[3]YAN J X.Research on Robotic Sorting Technology for Stacked Parts Based on Deep Learning[D].Hangzhou:Zhejiang University,2024.
[4]ZHANG H J,XIONG Z,LAO D B,et al.Monocular visionmeasurement system based on EPNP algorithm[J].Infrared and Laser Engineering,2019,48(5):0517005.
[5]LOWE D G.Distinctive image features from scale invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[6]RUBLEE E,RABAUD V,KONOLIGE K,et al.ORB:an effi-cient alternative to SIFT or SURF[C]//2011 International Conference on Computer Vision.New York:IEEE,2011:2564 2571.
[7]DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.New York:IEEE,2005:886-893.
[8]ZHANG Q P,CAO Y.Research on three-dimensional recon-struction algorithm of weak textured objects in indoor scenes[J].Laser & Optoelectronics Progress,2021,58(8):0810017.
[9]BESL P J,MCKAY N D.Method for registration of 3-D shapes[C]//Proceedings of SPIE.1992:586-606.
[10]TOMBARI F,SALTI S,DI STEFANO L.Unique signatures of histograms for local surface description[C]//Computer Vision-ECCV 2010.Heidelberg:Springer,2010:356-369.
[11]RUSU R B,BLODOW N,BEETZ M.Fast point feature histograms(FPFH) for 3D registration[C]//2009 IEEE Interna-tional Conference on Robotics and Automation.New York:IEEE,2009:3212-3217.
[12]JOHNSON A E.Spin-images:a representation for 3-D surface matching:CMU-RI-TR-97-47[R].Pittsburgh:Carnegie Mellon University,1997.
[13]JIANG Y,MOSESON S,SAXENA A.Efficient grasping from rgbd images:Learning using a new rectangle representation[C]//2011 IEEE International Conference on Robotics and Automation.IEEE,2011:3304-3311.
[14]DEPIERRE A,DELLANDRÉA E,CHEN L.Jacquard:A large scale dataset for robotic grasp detection.[C]//RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:3511-3516.
[15]XIANG Y,SCHMIDT T,NARAYANAN V,et al.Posecnn:A convolutional neural network for 6d object pose estimation in cluttered scenes[J].arXiv:1711.00199,2017.
[16]FANG H S,WANG C,GOU M,et al.Graspnet-1billion:Alarge-scalebenchmark for general object grasping[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11444-11453.
[17]KLEEBERGER K,BORMANN R,KRAUS W,et al.A survey on learning-based robotic grasping[J].Current Robotics Reports,2020,1:239-249.
[18]QI C R,YI L,SU H,et al.Pointnet++:Deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems.2017.
[19]ZHOU Y,TUZEL O.Voxelnet:End-to-end learning for pointcloud based 3d object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4490-4499.
[20]PENG S,LIU Y,HUANG Q,et al.Pvnet:Pixel-wise votingnetwork for 6dof pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4561-4570.
[21]JIANG P,ERGU D,LIU F,et al.A Review of Yolo algorithm developments[J].Procedia Computer Science,2022,199:1066-1073.
[22]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[23]WANG A,CHEN H,LIU L,et al.Yolov10:Real-time end-to-end object detection[J].Advances in Neural Information Processing Systems,2024,37:107984-108011.
[24]KIRILLOV A,MINTUN E,RAVI N,et al.Segment anything[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4015-4026.
[25]FISCHLER M A,BOLLES R C.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J].Communications of the ACM,1981,24(6):381-395.
[1] LI Yang, LIU Yi, LI Hao, ZHANG Gang, XU Mingfeng, HAO Chongqing. Human Pose Estimation Using Millimeter Wave Radar Based on Transformer and PointNet++ [J]. Computer Science, 2025, 52(6A): 240400169-9.
[2] JIANG Yiheng, LI Yang, LIU Chunyan , ZHAO Yunlong. Multi-view Multi-person 3D Human Pose Estimation Based on Center-point Attention [J]. Computer Science, 2025, 52(3): 68-76.
[3] ZENG Zehua, LUO Huilan. Cross-dataset Learning Combining Multi-object Tracking and Human Pose Estimation [J]. Computer Science, 2023, 50(6A): 220400199-7.
[4] GUO Nan, LI Jingyuan, REN Xi. Survey of Rigid Object Pose Estimation Algorithms Based on Deep Learning [J]. Computer Science, 2023, 50(2): 178-189.
[5] ZHENG Quanshi, JIN Cheng. 2D Human Pose Estimation Based on Adaptive Estimation [J]. Computer Science, 2023, 50(11A): 221000048-7.
[6] CHEN Qiaosong, WU Jiliang, JIANG Bo, TAN Chongchong, SUN Kaiwei, DEN Xin, WANG Jin. Coupling Local Features and Global Representations for 2D Human Pose Estimation [J]. Computer Science, 2023, 50(11A): 221100007-5.
[7] SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu. Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model [J]. Computer Science, 2022, 49(6): 254-261.
[8] ZHANG Guo-ping, MA Nan, Guan Huai-guang, WU Zhi-xuan. Research Progress of Deep Learning Methods in Two-dimensional Human Pose Estimation [J]. Computer Science, 2022, 49(12): 219-228.
[9] MA Wan-yi, ZHANG De-ping. Study on Human Pose Estimation Based on Multiscale Dual Attention [J]. Computer Science, 2022, 49(11A): 220100057-5.
[10] LIAO De, ZHANG Hui, ZHAO Chen-yang. Multi-camera Calibration Method Based on Large-scale Scene [J]. Computer Science, 2022, 49(11A): 211200054-6.
[11] YANG Lian-ping, SUN Yu-bo, ZHANG Hong-liang, LI Feng, ZHANG Xiang-de. Human Keypoint Matching Network Based on Encoding and Decoding Residuals [J]. Computer Science, 2020, 47(6): 114-120.
[12] PEI Jia-zhen, XU Zeng-chun, HU Ping. Person Re -identification Fusing Viewpoint Mechanism and Pose Estimation [J]. Computer Science, 2020, 47(6): 164-169.
[13] FENG Xiao-yue, SONG Jie. Research Advance on 2D Human Pose Estimation [J]. Computer Science, 2020, 47(11): 128-136.
[14] ZHU Shi-xin, YANG Ze-min. Line Tracking and Matching Algorithm Based on Semi-direct Method in Image Sequence [J]. Computer Science, 2019, 46(6A): 270-273.
[15] CUI Jing-chun, WANG Jing. Face Expression Recognition Model Based on Enhanced Head Pose Estimation [J]. Computer Science, 2019, 46(6): 322-327.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!