计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 77-85.doi: 10.11896/jsjkx.240200102

• 三维视觉与元宇宙 • 上一篇    下一篇

基于注意力机制与对比损失的单视图草图三维重建

钟悦1, 谷杰铭2   

  1. 1 中国政法大学证据科学研究院 北京 100088
    2 哈尔滨工业大学网络空间安全学院 哈尔滨 150001
  • 收稿日期:2024-02-26 修回日期:2024-09-25 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 钟悦(zhongyue@cupl.edu.cn)

3D Reconstruction of Single-view Sketches Based on Attention Mechanism and Contrastive Loss

ZHONG Yue1, GU Jieming2   

  1. 1 Institute of Evidence Law and Forensic Science,China University of Political Science and Law,Beijing 100088,China
    2 School of Cyberspace Science,Harbin Institute of Technology,Harbin 150001,China
  • Received:2024-02-26 Revised:2024-09-25 Online:2025-03-15 Published:2025-03-07
  • About author:ZHONG Yue,born in 1993,Ph.D,lecturer.Her main research interests include computer vision,3D reconstruction,sketch recognition,and multi-modality learning.

摘要: 元宇宙是三维的沉浸式互联空间。随着虚拟现实、人工智能等技术的发展,元宇宙正在重塑人类的生活方式。三维重建是元宇宙的核心技术之一,其中,基于深度学习的三维重建是计算机视觉领域的研究热点。针对手绘草图难以避免的前景和背景模糊性、绘制风格差异性和视角偏差问题,提出了基于注意力机制与对比损失的单视图草图三维重建方法,重建过程中无需额外的标注信息和交互操作。该模型首先通过空间变换模块矫正输入草图的空间位置,随后使用基于归一化的注意力模块在草图上建立长距离和多层次的依赖关系,利用草图的全局结构信息缓解前景和背景的模糊性所带来的重建困难,并设计对比损失函数使模型学习到对草图风格和视角不变的潜空间特征,提升模型对输入草图的鲁棒性。在多个数据集上的实验结果证明了所提模型的有效性和先进性。

关键词: 深度学习, 手绘草图, 三维重建, 单视图, 注意力机制

Abstract: The metaverse is a three-dimensional(3D) virtual space that is immersive and interconnected.With the development of technologies such as virtual reality and artificial intelligence,the metaverse is reshaping human lifestyles.3D reconstruction is a core technique for the metaverse,and deep learning-based 3D reconstruction has become a popular research direction in computer vision.To address the problems of inevitable foreground and background ambiguity,drawing style variations,and viewpoint differences in hand-drawn sketches,a single-view sketch 3D reconstruction model based on attention mechanisms and contrastive losses without requiring additional annotations or user interactions is proposed.The model first rectifies the spatial layout of the input sketch using spatial transformers,and then uses the normalized attention module to establish long-distance and multi-level dependencies on the sketch.The global structure information of the sketch is used to alleviate the reconstruction difficulty caused by the ambiguity of the foreground and background.Furthermore,the contrastive loss function is designed to encourage the model to learn view-invariant and style-invariant latent space features of the sketches,so as to improve robustness.Experimental results on multiple datasets demonstrate the effectiveness and advancement of the proposed model.

Key words: Deep learning, Free-hand sketch, 3D reconstruction, Single view, Attention mechanism

中图分类号: 

  • TP391.41
[1]NING H,WANG H,LIN Y,et al.A Survey on the Metaverse:The State-of-the-Art,Technologies,Applications,and Challenges[J].IEEE Internet of Things Journal,2023,10(16):14671-14688.
[2]CHEN X,ZOU D,XIE H,et al.Metaverse in Education:Contributors,Cooperations,and Research Themes[J].IEEE Tran-sactions on Learning Technologies,2023,16(6):1111-1129.
[3]NJOKU J N,NWAKANMA C I,AMAIZU G C,et al.Prospects and challenges of Metaverse application in data-driven intelligent transportation systems[J].IET Intelligent Transport Systems,2023,17(1):1-21.
[4]WANG G,BADAL A,JIA X,et al.Development of metaverse for intelligent healthcare[J].Nature Machine Intelligence,2022,4(11):922-929.
[5]WANG Y,SU Z,ZHANG N,et al.A survey on metaverse:Fundamentals,security,and privacy[J].IEEE Communications Surveys & Tutorials,2022,25(1):319-352.
[6]ANCIUKEVIČIUS T,XU Z,FISHER M,et al.Renderdiffusion:Image diffusion for 3d reconstruction,inpainting and gene-ration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12608-12618.
[7]LI J,GAO W,WU Y,et al.High-quality indoor scene 3D reconstruction with RGB-D cameras:A brief review[J].Computa-tional Visual Media,2022,8(3):369-393.
[8]GAO C,YU Q,SHENG L,et al.SketchSampler:Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling[C]//Proceedings of the European Conference on Computer Vision.2022:464-479.
[9]IGARASHI T,MATSUOKA S,TANAKA H.Teddy:a Sketching Interface for 3D Freeform Design[C]//Proceedings of the Conference on Conputer Graphics and Interactive Techniques.1999:409-416.
[10]BAE S H,BALAKRISHNAN R,SINGH K.ILoveSketch:as-natural-as-possible sketching system for creating 3d curve mo-dels[C]//Proceedings of the ACM Symposium on User Interface Software and Technology.2008:151-160.
[11]XU B,CHANG W,SHEFFER A,et al.True2Form:3D curve networks from 2D sketches via selective regularization[J].ACM Transactions on Graphics,2014,33(4):1-13.
[12]ZHU Z,YANG L,LIN X,et al.GARNet:Global-aware multi-view 3D reconstruction network and the cost-performance tradeoff[J].Pattern Recognition,2023,142:109674.
[13]WEN C,ZHANG Y,LI Z,et al.Pixel2mesh++:Multi-view 3d mesh generation via deformation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1042-1051.
[14]CHOY C B,XU D,GWAK J Y,et al.3d-r2n2:A unified approach for single and multi-view 3d object reconstruction[C]//Proceedings of the European Conference on Computer Vision.2016:628-644.
[15]WANG N,ZHANG Y,LI Z,et al.Pixel2mesh:Generating 3d mesh models from single rgb images[C]//Proceedings of the European Conference on Computer Vision.2018:52-67.
[16]WU J,ZHANG C,ZHANG X,et al.Learning shape priors for single-view 3d completion and reconstruction[C]//Proceedings of the European Conference on Computer Vision.2018:646-662.
[17]YAO Y,SCHERTLER N,ROSALES E,et al.Front2back:Single view 3d shape reconstruction via front to back prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:531-540.
[18]LIU R,WU R,VAN HOORICK B,et al.Zero-1-to-3:Zero-shot one image to 3d object[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:9298-9309.
[19]SHRESTHA R,FAN Z,SU Q,et al.Meshmvs:Multi-view ste-reo guided mesh reconstruction[C]//International Conference on 3D Vision.2021:1290-1300.
[20]ROSU R A,BEHNKE S.Permutosdf:Fast multi-view reconstruction with implicit surfaces using permutohedral lattices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:8466-8475.
[21]LONG X,LIN C,LIU L,et al.Neuraludf:Learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20834-20843.
[22]HAN X,GAO C,YU Y.DeepSketch2Face:a deep learningbased sketching system for 3D face and caricature modeling[J].ACM Transactions on Graphics,2017,36(4):1-12.
[23]ZHANG S H,GUO Y C,GU Q W.Sketch2model:View-aware 3d modeling from single free-hand sketches[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6012-6021.
[24]WANG J,LIN J,YU Q,et al.3D Shape Reconstruction fromFree-hand Sketches[C]//Proceedings of the European Confe-rence on Computer Vision.2022:184-202.
[25]CHEN T,FU C,ZANG Y,et al.Deep3DSketch+:Rapid 3D Modeling from Single Free-Hand Sketches[C]//International Conference on Multimedia Modeling.2023:16-28.
[26]LUN Z,GADELHA M,KALOGERAKIS E,et al.3d shape reconstruction from sketches via multi-view convolutional networks[C]//International Conference on 3D Vision.2017:67-77.
[27]LI C,PAN H,LIU Y,et al.Robust flow-guided neural prediction for sketch-based freeform surface modeling[J].ACM Transactions on Graphics,2018,37(6):1-12.
[28]DELANOY J,AUBRY M,ISOLA P,et al.3d sketching using multi-view deep volumetric prediction[J].Proceedings of the ACM on Computer Graphics and Interactive Techniques,2018,1(1):1-22.
[29]ZHOU J,LUO Z,YU Q,et al.GA-Sketching:Shape Modeling from Multi-View Sketching with Geometry-Aligned Deep Implicit Functions[J].Computer Graphics Forum,2023,42(7):e14948.
[30]XIE H,YAO H,ZHANG S,et al.Pix2Vox++:Multi-scale context-aware 3D object reconstruction from single and multiple images[J].International Journal of Computer Vision,2020,128(12):2919-2935.
[31]YANG B,WANG S,MARKHAM A,et al.Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction[J].International Journal of Computer Vision,2020,128(1):53-73.
[32]NEALEN A,SORKINE O,ALEXA M,et al.A Sketch-based Interface for Detail-preserving Mesh Editing[J].ACM Transactions on Graphics,2005,3(24):1142-1147.
[33]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[C]//Proceedings of the Conference on Neural Information Processing Systems.2015:2017-2025.
[34]NEWMAN T S,YI H.A survey of the marching cubes algorithm[J].Computers & Graphics,2006,30(5):854-879.
[35]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[36]LIU Y,SHAO Z,TENG Y,et al.NAM:Normalization-basedattention module[J].arXiv:2111.12419,2021.
[37]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision.2018:3-19.
[38]ZHANG H,GOODFELLOW I,METAXAS D,et al.Self-attention generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning.2019:7354-7363.
[39]FU J,LIU J,TIAN H,et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3146-3154.
[40]SUN Y,WANG Y,LIU Z,et al.Pointgrow:Autoregressivelylearned point cloud generation with self-attention[C]//Procee-dings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2020:61-70.
[41]MISRA D,NALAMADA T,ARASANIPALAI A U,et al.Rotate to attend:Convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:3139-3148.
[42]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning.2015:448-456.
[43]ZHONG Y,GRYADITSKAYA Y,ZHANG H,et al.Deepsketch-based modeling:Tips and tricks[C]//International Conference on 3D Vision.2020:543-552.
[44]ZHONG Y,QI Y,GRYADITSKAYA Y,et al.Towards practical sketch-based 3d shape generation:The role of professional sketches[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(9):3518-3528.
[45]WAILLY B,BOUSSEAU A.Line rendering of 3d meshes fordata-driven sketch-based modeling[C]//Journées Francaises d′Informatique Graphique et de Réalité virtuelle.2019.
[46]PAN J,HAN X,CHEN W,et al.Deep mesh reconstruction from single rgb images via topology modification networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9964-9973.
[47]MESCHEDER L,OECHSLE M,NIEMEYER M,et al.Occu-pancy networks:Learning 3d reconstruction in function space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4460-4470.
[48]FAN H,SU H,GUIBAS L J.A point set generation network for 3d object reconstruction from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:605-613.
[49]PARK J J,FLORENCE P,STRAUB J,et al.Deepsdf:Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:165-174.
[50]KONG D,WANG Q,QI Y.A diffusion-refinement model forsketch-to-point modeling[C]//Proceedings of the Asian Confe-rence on Computer Vision.2022:1522-1538.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!