计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 231-238.doi: 10.11896/jsjkx.230300154

• 计算机图形学&多媒体 • 上一篇    下一篇

融合Transformer与多阶段学习框架的点云上采样网络

李泽锴, 柏正尧, 肖霄, 张奕涵, 尤逸琳   

  1. 云南大学信息学院 昆明 650500
  • 收稿日期:2023-03-20 修回日期:2023-07-01 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 柏正尧(baizhy@ynu.edu.cn)
  • 作者简介:(lizekai0710@163.com)
  • 基金资助:
    云南省重大科技专项计划(202002AD080001)

Point Cloud Upsampling Network Incorporating Transformer and Multi-stage Learning Framework

LI Zekai, BAI Zhengyao, XIAO Xiao, ZHANG Yihan, YOU Yilin   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2023-03-20 Revised:2023-07-01 Online:2024-06-15 Published:2024-06-05
  • About author:LI Zekai,born in 1997,postgraduate.His main research interests include point cloud processing and 3D mode-ling.
    BAI Zhengyao,born in 1967,Ph.D,professor,master supervisor.His main research interests include signal proces-sing,image processing,pattern recognition and machine learning,etc.
  • Supported by:
    Yunnan Provincial Major Science and Technology Special Plan(202002AD080001).

摘要: 借鉴Transformer在自然语言和计算机视觉领域强大的特征编码能力,同时受多阶段学习框架的启发,设计了一种融合Transformer与多阶段学习框架的点云上采样网络——MSPUiT。该网络采用二阶段网络模型,第一阶段是密集点生成网络,利用多层Transformer编码器逐步实现从输入点云的局部几何信息、局部特征信息到点云高级语义特征的转换,特征扩充模块在特征空间中,对点云特征上采样,坐标回归模块将点云从特征空间重新映射回欧氏空间中初步生成密集点云M′;第二阶段是逐点优化网络,使用Transformer编码器对密集点云M′中潜藏的语义特征进行编码,联合上一阶段语义特征得到点云完整的语义特征,特征精炼单元从M′的几何信息和语义特征中提取点的误差信息特征,误差回归模块从误差信息特征中计算得到欧氏空间中点的坐标偏移量,实现对点云M′的逐点优化,使得点云上点的分布更加均匀,并且更加贴近真实物体表面。在大型合成数据集PU1K上进行了大量实验,MSPUiT生成的高分辨率点云在倒角距离(CD)、豪斯多夫距离(HD)、生成点云到原始点云块的距离(P2F)上的指标分别降至0.501×10-3,5.958×10-3,1.756×10-3。实验结果表明,MSPUiT上采样后的点云表面更加光滑,噪声点更少,生成的点云质量高于当前主流的点云上采样网络。

关键词: Transformer编码器, 多阶段学习框架, 特征转换, 点云上采样, 深度学习

Abstract: Drawing on Transformer’s powerful feature encoding capabilities in the fields of natural language and computer vision,and inspired by a multi-stage learning framework,a point cloud upsampling network that incorporates Transformer and multi-stage learning framework is designed.The network adopts a two-stage network model,the first stage is a dense point generation network,using a multi-layer Transformer encoder to progressively transform the local geometric information and local feature information of the input point cloud to the high-level semantic features of the point cloud,the feature expansion module upsamples the point cloud features in the feature space,the coordinate regression module remaps the point cloud from the feature space back to the Euclidean space to initially generate a dense point cloud.The second stage is the point-by-point optimisation network,using the Transformer encoder to encode the latent semantic features in the dense point cloud,and combining the semantic features from the previous stage to obtain the complete semantic features of the point cloud,the information integration module extracts the error features of the points from the geometric information and semantic features of the dense point cloud,and the error regression module calculates the coordinate offset of the points in Euclidean space from the error features to realise the point-by-point optimisation of the dense point cloud,so that the distribution of points on the point cloud is more uniform and closer to the real object surface.In extensive experiments on the large synthetic dataset PU1K,the high-resolution point clouds generated by MSPUiT are reduced to 0.501×10-3,5.958×10-3 and 1.756×10-3 in terms of Chamfer Distance(CD),Hausdorff Distance(HD) and distance from the generated point cloud to the original point cloud block(P2F),respectively.Experimental results show that the surface of the point cloud is smoother and less noisy after upsampling by MSPUiT,and the quality of the generated point cloud is higher than that of the current mainstream point cloud upsampling networks.

Key words: Transformer encoder, Multi-stage learning framework, Feature conversion, Point cloud upsampling, Deep learning

中图分类号: 

  • TP391
[1]CHE A B,ZHANG H,LI C,et al.Single-stage 3D Object Detector in Traffic Environment Based on Point Cloud Data[J].Computer Science,2022,49(S2):567-572.
[2]ZHAO X C,CHANG H X,JIN R B.3D Point Cloud Shape Completion GAN[J].Computer Science,2021,48(4):192-196.
[3]QI S H,XU H G,WAN Y W,et al.Construction of Semantic Mapping in Dynamic Environments[J].Computer Science,2020,47(9):198-203.
[4]FOIX S,ALENYA G,TORRAS C.Lock-in Time-of-Flight(ToF) cameras:a survey[J].IEEE Sensors Journal,2011,11(9):1917-1926.
[5]SCHUON S,THEOBALT C,DAVIS J,et al.High-quality scanning using time-of-flight depth superresolution[C]//Procee-dings of 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.IEEE,2008:1-7.
[6]RICHARDSON J,WALKER R,GRANT L,et al.A 32×3250 ps resolution 10 bit time to digital converter array in 130 nm CMOS for time correlated imaging[C]//Proceedings of 2009 IEEE Custom Integrated Circuits Conference.Washington D.C,USA:IEEE Press,2009:77-80.
[7]ALEXA M,BEHR J,COHEN-OR D,et al.Computing and rendering point set surfaces[J].IEEE Transactions on Visualization and Computer Graphics,2003,9(1):3-15.
[8]LIPMAN Y,COHEN-OR D,LEVIN D,et al.Parameterization-free projection for geometry reconstruction[J].ACM Transactions on Graphics,2007,26(3):22:1-5.
[9]HUANG H,LI D,ZHANG H,et al.Consolidation of unorga-nized point clouds for surface reconstruction[J].ACM Transactions on Graphics,2009,28(5):176:1-8.
[10]HUANG H,WU S,GONG M,et al.Edge-aware point set resam-pling[J].ACM Transactions on Graphics,2013,32(1):9:1-9:12.
[11]WU S H,HUANG H,GONG M L,et al.Deep points consolidation[J].ACM Transactions on Graphics,2015,34(6) 176:1-176:7.
[12]LI R H,LI X Z,HENG P A,et al.Point Cloud Upsampling via Disentangled Refinement[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2021:344-353.
[13]YU L Q,LI X Z,FU C W,et al.PU-Net:Point Cloud Upsampling Network[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:2790-2799.
[14]WANG Y F,WU S H,HUANG H,et al.Patch-based Progressive 3D Point Set Upsampling[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:5951-5960.
[15]LI R H,LI X Z,FU C W,et al.PU-GAN:a Point Cloud Upsampling Adversarial Network[C]//Proceeding of IEEE International Conference on Computer Vision.2019:7203-7212.
[16]QIAN G C,ABDULELLAH A,LI G H.PU-GCN:Point Cloud Upsampling using Graph Convolutional Networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2021:11683-11692.
[17]HAN B,ZHANG X,REN S.PU-GACNet:Graph AttentionConvolution Network for Point Cloud Upsampling[J].Image and Vision Computing,2022,118:104371.
[18]GU F,ZHANG C L,WANG H Y,et al.PU-WGCN:PointCloud Upsampling Using Weighted Graph Convolutional Networks[J].Remote Sensing,2022,14(21):5356.
[19]LIU Y L,WANG Y M,LIU Y.Refine-PU:A Graph Convolutional Point Cloud Upsampling Network using Spatial Refinement[C]//Proceeding of the 2022 IEEE International Confe-rence on Visual Communications and Image Processing(VCIP).2022:1-5.
[20]ASHISH V,NOAM S,NIKI P,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing.2017:6000-6010.
[21]YANG F Z,YANG H,FU J L,et al.Learning texture transformer network for image super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5791-5800.
[22]ALEXEY D,LUCAS B,ALEXANDER K,et al.An image isworth 16x16 words:Transformers for image recognition at scale[EB/OL].https://arxiv.org/abs/2010.11929.
[23]NICOLAS C,FRANCISCO M,SYNNAEV G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.2020:213-229.
[24]YAN X,ZHENG C D,LI Z,et al.PointASNL:Robust point clouds processing using nonlocal neural networks with adaptive sampling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5588-5597.
[25]AMIR H,RANA H,RAJA G,et al.PointGMM:A neural GMMnetwork for point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12051-12060.
[26]LU D N,XIE Q,WEI M Q.Transformers in 3d point clouds:A survey[EB/OL].https://arxiv.org/abs/2205.07417.
[27]GUO M H,CAI J X,LIU Z N,et al.Pct:Point cloud transfor-mer[J].Computational Visual Media,2021,7(2):187-199.
[28]YU X M,TANG L L,RAO Y M,et al.Point-BERT:Pre-trai-ning 3D Point Cloud Transformers with Masked Point Modeling[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2022:19291-19300.
[29]ZHAO H S,JIANG L,JIA J,et al.Point Transformer[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision(ICCV).2021:16239-16248.
[30]HUI L,YANG H,CHENG M M,et al.Pyramid Point CloudTransformer for Large-Scale Place Recognition[C]//Procee-dings of the 2021 IEEE/CVF International Conference on Computer Vision(ICCV).2021:6078-6087.
[31]WANG Y,SUN Y B,LIU Z W,et al.Dynamic Graph CNN for Learning on Point Clouds[EB/OL].https://arxiv.org/abs/1801.07829.
[32]THOMAS H,QI C R,DESCHAUD J E,et al.KPConv:Flexible and Deformable Convolution for Point Clouds[EB/OL].https://arxiv.org/abs/1904.08889.
[33]CHANG A X,FUNKHOUSER T,GUIBAS L,et al.ShapeNet:An Information-Rich 3D Model Repository[EB/OL].https://arxiv.org/abs/1512.03012.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!