计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 195-203.doi: 10.11896/jsjkx.240900086

• 计算机图形学&多媒体 • 上一篇    下一篇

IBSNet:用于估计单视角扫描点云交互平分面的神经隐式场

袁右文, 金朔, 赵玺   

  1. 西安交通大学计算机科学与技术学院 西安 710049
  • 收稿日期:2024-09-13 修回日期:2025-01-24 出版日期:2025-08-15 发布日期:2025-08-08
  • 通讯作者: 赵玺(xi.zhao@xjtu.edu.cn)
  • 作者简介:(2193412689@stu.xjtu.edu.cn)
  • 基金资助:
    国家重点研发计划(2022YFB3303202);国家自然科学基金(62072366,U23A20312)

IBSNet:A Neural Implicit Field for IBS Prediction in Single-view Scanned Point Cloud

YUAN Youwen, JIN Shuo, ZHAO Xi   

  1. College of Computer Science and Technology,Xi'an Jiaotong University,Xi'an 710049,China
  • Received:2024-09-13 Revised:2025-01-24 Online:2025-08-15 Published:2025-08-08
  • About author:YUAN Youwen,born in 2001,postgra-duate.His main research interests include 3D point cloud processing and analysis,and 3D interaction relationship analysis.
    ZHAO Xi,born in 1985,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.86701M).Her main research interests include 3D data analysis and processing and synthesis.
  • Supported by:
    National Key Research and Development Program of China(2022YFB3303202) and National Natural Science Foundation of China(62072366,U23A20312).

摘要: 三维物体之间的空间关系分析对于多物体场景的理解及合成具有重要意义。传统的三维空间关系分析方法计算物体之间的交互平分面(Interaction Bisector Surface,IBS)并进一步提取其特征。然而,当输入为单视角扫描点云时,由于数据完整性的缺失,使用传统方法往往难以计算出准确的交互平分面,从而极大地影响了下游任务(如场景分类、分析、合成等)。针对此问题,提出一种面向单视角扫描点云的交互平分面估计方法,使用神经网络框架IBSNet估计双物体的差分无符号距离场,然后基于这种隐式距离场的表示提取交互平分面。在ICON数据集上对该方法与其他方法(几何方法、IMNet、Grasping Field)进行了对比实验,并测试了各个方法在面对不同残缺程度和噪声程度的单视角扫描点云时的鲁棒性。实验结果表明,该方法对于残缺的单视角扫描点云有一定的鲁棒性,可以有效地估计出形状之间的交互平分面。

关键词: 空间关系分析, 交互平分面, 单视角扫描点云, 神经隐式场, 无符号距离场

Abstract: The analysis of spatial relationships between 3D objects is of great significance for scene understanding and interaction.For example,by analyzing the spatial relationship between the robot and the object,the robot can be guided to grasp the object more accurately.By learning the spatial relationship between objects in the real scene,it can guide the generation of virtual scenes that look more natural or better meet the needs of interaction.However,because the single-view scanned point clouds gotten by RGB-D cameras or LiDAR usually have many artifacts and noise,existing methods for analyzing the spatial relationships of objects are often difficult to make accurate predictions when faced with single-view scanned point clouds,which makes these me-thods impractical for practical applications.For handling the spatial relationship analysis of single-view scanned point clouds,this paper uses the interaction bisector surface(IBS) to express spatial relationships,and proposes a differential unsigned distance field of dual-object to implicitly represent IBS.Inspired by the implicit function learning methods widely used in recent years,this paper designs a neural implicit field to fit the differential unsigned distance field.This neural implicit field takes the single-view scanned point clouds of two objects as input and returns the different unsigned distance field of the two objects.This network uses two multi-layer self-attention point cloud encoders to extract the features of the two input point clouds and combines these features after that.Then these features are inputted into a dual-object unsigned distance decoder to get the unsigned distance va-lue of the query points.Comparative experiments of this method with other methods(Geometry Method,IMNet and Grasping Field) are conducted on the ICON dataset.It simulates single-view scans of each scene from 26 different viewpoints to get the single-view scanned point clouds and split the whole dataset into training set and test set based on a single scene.The robustness of each method is also tested when facing single-view scanning point clouds with different degrees of incompleteness and noise.Experimental results show that theproposed neural implicit field is very robust to the input single-view scanned point clouds with different degrees of incompleteness,and can efficiently predict IBS with accurate shapes.

Key words: Spatial relationship analysis, Interaction bisector surface, Single-view scanned point cloud, Neural implicit field, Unsigned distance field

中图分类号: 

  • TP391
[1]HUANG Z Y,XU J Z,DAI S S,et.al.NIFT:Neural interaction field and template for object manipulation[C]//2023 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2023:1875-1881.
[2]SHE Q J,HU R Z,XU J Z,et.al.Learning high-DOF reaching-and-grasping via dynamic representation of gripper-object interaction[J].ACM Transactions on Graphics,2022,41(4):1-14.
[3]ZHAO X,WANG H,KOMURA T,et.al.Indexing 3D Scenes Using the Interaction Bisector Surface[J].ACM Transactions on Graphics,2014,33(3):1-14.
[4]PARK J J,FLORENCE P,STRAUB J,et al.DeepSDF:Lear-ning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:165-174.
[5]CHEN Z Q,ZHANG H.Learning implicit fields for generative shape modeling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:5939-5948.
[6]SITZMANN V,ZOLLHÖFER M,WETZSTEIN G,et.al.Scene representation networks:Continuous 3d-structure-aware neural scene representations[J].arXiv:1906.01618,2019.
[7]WANG P,LIU L J,LIU Y,et.al.NeuS:Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction[C]//NIPS 2021.2021:27171-27183.
[8]FU Q C,XU Q S,ONG Y S,et al.Geo-Neus:Geometry-consistent neural implicit surfaces learning for multi-view reconstruction[J].Advances in Neural Information Processing Systems,2022,35:3403-3416.
[9]CHIBANE J,MIR A,PONS-MOLL G.Neural unsigned distancefields for implicit function learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.2020:21638-21652.
[10]HU R,ZHU C,VAN KAICK O,et al.Interaction context(ICON):towards a geometric functionality descriptor[J].ACM Transactions on Graphics,2015,34(4):1-12.
[11]WU Z,SONG S,KHOSLA A,et al.3D ShapeNets:A Deep Re-presentation for Volumetric Shapes[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2015:1912-1920.
[12]MATURANA D,SCHERER S.VoxNet:A 3D ConvolutionalNeural Network for real-time object recognition[C]//2015 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).2015:922-928.
[13]RIEGLER G,ULUSOY A O,GEIGER A.OctNet:LearningDeep 3D Representations at High Resolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:3577-3586.
[14]CHARLES R Q,SU H,KAICHUN M,et al.PointNet:Deep Learning on Point Sets for 3D Classificationand Segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:652-660.
[15]ZHAO H,JIANG L,JIA J,et al.Point Transformer.[C]//2021 IEEE/CVF International Conferenceon Computer Vision(ICCV).2021:16259-16268.
[16]GUO M H,CAI J X,LIU Z N,et al.PCT:Point cloud transformer[J].Computational Visual Media,2021(7):187-199.
[17]LIU M Y,YANG Q M,HU G H,et al.3D point cloud object detection algorithm based on Transformer[J].Journal of Northwestern Polytechnical University,2023,41(6):1190-1197.
[18]LIU X H,BAI Z Y,XU Z,et al.Multi-guided Point CloudRegistration Network Combined with Attention Mechanism[J].Computer Science,2024,51(2):142-150.
[19]KARUNRATANAKUL K,YANG J,ZHANG Y,et al.Gras-ping Field:Learning Implicit Representations forHuman Grasps[C]//2020 International Conference on 3D Vision(3DV).2020:333-344.
[20]ZHAO X,ZHANG B,WU J,et al.Relationship-Based PointCloud Completion[J].IEEE Transactions on Visualization and Computer Graphics,2022,28(12):4940-4950.
[21]HUANG Z Y,DAI S S,XU K,et.al.DINA:Deformable INteraction Analogy[J].Graphical Models,2024,133:101217.
[22]XUAN H B,LI X Z,ZHANG J S,et al.Narrator:TowardsNatural Control of Human-Scene Interaction Generation via Relationship Reasoning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:22268-22278.
[23]ZHAO X,HU R,GUERRERO P,et al.Relationship templates for creating scene variations[J].ACM Transactions on Gra-phics,2016,35(6):1-13.
[24]HUANG Z,XU J,DAI S,et al.NIFT:Neural Interaction Field and Template for Object Manipulation [C]//2023 IEEE International Conference on Robotics and Automation(ICRA).2022:1875-1881.
[25]WALD J,DHAMO H,NAVAB N,et al.Learning 3d semantic scene graphs from 3D indoor reconstructions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3961-3970.
[26]LIU Y Y,LONG C J,ZHANG Z X,et al.Explore Contextual Information for 3D Scene Graph Generation[J].IEEE Transactions on Visualization and Computer Graphics,2023,29(12):5556-5568.
[27]CHABRA R,LENSSEN J E,ILG E,et al.Deep Local Shapes:Learning Local SDF Priors for Detailed 3D Reconstruction[C]//Computer Vision-ECCV 2020,Lecture Notes in Computer Science.2020:608-625.
[28]CHEN Z,ZHANG H.Learning Implicit Fields for GenerativeShape Modeling[C]//2019 IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition(CVPR).2019:5939-5948.
[29]YUAN W,KHOT T,HELD D,et al.PCN:Point Completion Network[C]//2018 International Conference on 3D Vision(3DV).2018:728-744.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!