Computer Science ›› 2025, Vol. 52 ›› Issue (3): 77-85.doi: 10.11896/jsjkx.240200102

• 3D Vision and Metaverse • Previous Articles     Next Articles

3D Reconstruction of Single-view Sketches Based on Attention Mechanism and Contrastive Loss

ZHONG Yue1, GU Jieming2   

  1. 1 Institute of Evidence Law and Forensic Science,China University of Political Science and Law,Beijing 100088,China
    2 School of Cyberspace Science,Harbin Institute of Technology,Harbin 150001,China
  • Received:2024-02-26 Revised:2024-09-25 Online:2025-03-15 Published:2025-03-07
  • About author:ZHONG Yue,born in 1993,Ph.D,lecturer.Her main research interests include computer vision,3D reconstruction,sketch recognition,and multi-modality learning.

Abstract: The metaverse is a three-dimensional(3D) virtual space that is immersive and interconnected.With the development of technologies such as virtual reality and artificial intelligence,the metaverse is reshaping human lifestyles.3D reconstruction is a core technique for the metaverse,and deep learning-based 3D reconstruction has become a popular research direction in computer vision.To address the problems of inevitable foreground and background ambiguity,drawing style variations,and viewpoint differences in hand-drawn sketches,a single-view sketch 3D reconstruction model based on attention mechanisms and contrastive losses without requiring additional annotations or user interactions is proposed.The model first rectifies the spatial layout of the input sketch using spatial transformers,and then uses the normalized attention module to establish long-distance and multi-level dependencies on the sketch.The global structure information of the sketch is used to alleviate the reconstruction difficulty caused by the ambiguity of the foreground and background.Furthermore,the contrastive loss function is designed to encourage the model to learn view-invariant and style-invariant latent space features of the sketches,so as to improve robustness.Experimental results on multiple datasets demonstrate the effectiveness and advancement of the proposed model.

Key words: Deep learning, Free-hand sketch, 3D reconstruction, Single view, Attention mechanism

CLC Number: 

  • TP391.41
[1]NING H,WANG H,LIN Y,et al.A Survey on the Metaverse:The State-of-the-Art,Technologies,Applications,and Challenges[J].IEEE Internet of Things Journal,2023,10(16):14671-14688.
[2]CHEN X,ZOU D,XIE H,et al.Metaverse in Education:Contributors,Cooperations,and Research Themes[J].IEEE Tran-sactions on Learning Technologies,2023,16(6):1111-1129.
[3]NJOKU J N,NWAKANMA C I,AMAIZU G C,et al.Prospects and challenges of Metaverse application in data-driven intelligent transportation systems[J].IET Intelligent Transport Systems,2023,17(1):1-21.
[4]WANG G,BADAL A,JIA X,et al.Development of metaverse for intelligent healthcare[J].Nature Machine Intelligence,2022,4(11):922-929.
[5]WANG Y,SU Z,ZHANG N,et al.A survey on metaverse:Fundamentals,security,and privacy[J].IEEE Communications Surveys & Tutorials,2022,25(1):319-352.
[6]ANCIUKEVIČIUS T,XU Z,FISHER M,et al.Renderdiffusion:Image diffusion for 3d reconstruction,inpainting and gene-ration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12608-12618.
[7]LI J,GAO W,WU Y,et al.High-quality indoor scene 3D reconstruction with RGB-D cameras:A brief review[J].Computa-tional Visual Media,2022,8(3):369-393.
[8]GAO C,YU Q,SHENG L,et al.SketchSampler:Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling[C]//Proceedings of the European Conference on Computer Vision.2022:464-479.
[9]IGARASHI T,MATSUOKA S,TANAKA H.Teddy:a Sketching Interface for 3D Freeform Design[C]//Proceedings of the Conference on Conputer Graphics and Interactive Techniques.1999:409-416.
[10]BAE S H,BALAKRISHNAN R,SINGH K.ILoveSketch:as-natural-as-possible sketching system for creating 3d curve mo-dels[C]//Proceedings of the ACM Symposium on User Interface Software and Technology.2008:151-160.
[11]XU B,CHANG W,SHEFFER A,et al.True2Form:3D curve networks from 2D sketches via selective regularization[J].ACM Transactions on Graphics,2014,33(4):1-13.
[12]ZHU Z,YANG L,LIN X,et al.GARNet:Global-aware multi-view 3D reconstruction network and the cost-performance tradeoff[J].Pattern Recognition,2023,142:109674.
[13]WEN C,ZHANG Y,LI Z,et al.Pixel2mesh++:Multi-view 3d mesh generation via deformation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1042-1051.
[14]CHOY C B,XU D,GWAK J Y,et al.3d-r2n2:A unified approach for single and multi-view 3d object reconstruction[C]//Proceedings of the European Conference on Computer Vision.2016:628-644.
[15]WANG N,ZHANG Y,LI Z,et al.Pixel2mesh:Generating 3d mesh models from single rgb images[C]//Proceedings of the European Conference on Computer Vision.2018:52-67.
[16]WU J,ZHANG C,ZHANG X,et al.Learning shape priors for single-view 3d completion and reconstruction[C]//Proceedings of the European Conference on Computer Vision.2018:646-662.
[17]YAO Y,SCHERTLER N,ROSALES E,et al.Front2back:Single view 3d shape reconstruction via front to back prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:531-540.
[18]LIU R,WU R,VAN HOORICK B,et al.Zero-1-to-3:Zero-shot one image to 3d object[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:9298-9309.
[19]SHRESTHA R,FAN Z,SU Q,et al.Meshmvs:Multi-view ste-reo guided mesh reconstruction[C]//International Conference on 3D Vision.2021:1290-1300.
[20]ROSU R A,BEHNKE S.Permutosdf:Fast multi-view reconstruction with implicit surfaces using permutohedral lattices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:8466-8475.
[21]LONG X,LIN C,LIU L,et al.Neuraludf:Learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20834-20843.
[22]HAN X,GAO C,YU Y.DeepSketch2Face:a deep learningbased sketching system for 3D face and caricature modeling[J].ACM Transactions on Graphics,2017,36(4):1-12.
[23]ZHANG S H,GUO Y C,GU Q W.Sketch2model:View-aware 3d modeling from single free-hand sketches[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6012-6021.
[24]WANG J,LIN J,YU Q,et al.3D Shape Reconstruction fromFree-hand Sketches[C]//Proceedings of the European Confe-rence on Computer Vision.2022:184-202.
[25]CHEN T,FU C,ZANG Y,et al.Deep3DSketch+:Rapid 3D Modeling from Single Free-Hand Sketches[C]//International Conference on Multimedia Modeling.2023:16-28.
[26]LUN Z,GADELHA M,KALOGERAKIS E,et al.3d shape reconstruction from sketches via multi-view convolutional networks[C]//International Conference on 3D Vision.2017:67-77.
[27]LI C,PAN H,LIU Y,et al.Robust flow-guided neural prediction for sketch-based freeform surface modeling[J].ACM Transactions on Graphics,2018,37(6):1-12.
[28]DELANOY J,AUBRY M,ISOLA P,et al.3d sketching using multi-view deep volumetric prediction[J].Proceedings of the ACM on Computer Graphics and Interactive Techniques,2018,1(1):1-22.
[29]ZHOU J,LUO Z,YU Q,et al.GA-Sketching:Shape Modeling from Multi-View Sketching with Geometry-Aligned Deep Implicit Functions[J].Computer Graphics Forum,2023,42(7):e14948.
[30]XIE H,YAO H,ZHANG S,et al.Pix2Vox++:Multi-scale context-aware 3D object reconstruction from single and multiple images[J].International Journal of Computer Vision,2020,128(12):2919-2935.
[31]YANG B,WANG S,MARKHAM A,et al.Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction[J].International Journal of Computer Vision,2020,128(1):53-73.
[32]NEALEN A,SORKINE O,ALEXA M,et al.A Sketch-based Interface for Detail-preserving Mesh Editing[J].ACM Transactions on Graphics,2005,3(24):1142-1147.
[33]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[C]//Proceedings of the Conference on Neural Information Processing Systems.2015:2017-2025.
[34]NEWMAN T S,YI H.A survey of the marching cubes algorithm[J].Computers & Graphics,2006,30(5):854-879.
[35]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[36]LIU Y,SHAO Z,TENG Y,et al.NAM:Normalization-basedattention module[J].arXiv:2111.12419,2021.
[37]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision.2018:3-19.
[38]ZHANG H,GOODFELLOW I,METAXAS D,et al.Self-attention generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning.2019:7354-7363.
[39]FU J,LIU J,TIAN H,et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3146-3154.
[40]SUN Y,WANG Y,LIU Z,et al.Pointgrow:Autoregressivelylearned point cloud generation with self-attention[C]//Procee-dings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2020:61-70.
[41]MISRA D,NALAMADA T,ARASANIPALAI A U,et al.Rotate to attend:Convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:3139-3148.
[42]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning.2015:448-456.
[43]ZHONG Y,GRYADITSKAYA Y,ZHANG H,et al.Deepsketch-based modeling:Tips and tricks[C]//International Conference on 3D Vision.2020:543-552.
[44]ZHONG Y,QI Y,GRYADITSKAYA Y,et al.Towards practical sketch-based 3d shape generation:The role of professional sketches[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(9):3518-3528.
[45]WAILLY B,BOUSSEAU A.Line rendering of 3d meshes fordata-driven sketch-based modeling[C]//Journées Francaises d′Informatique Graphique et de Réalité virtuelle.2019.
[46]PAN J,HAN X,CHEN W,et al.Deep mesh reconstruction from single rgb images via topology modification networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9964-9973.
[47]MESCHEDER L,OECHSLE M,NIEMEYER M,et al.Occu-pancy networks:Learning 3d reconstruction in function space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4460-4470.
[48]FAN H,SU H,GUIBAS L J.A point set generation network for 3d object reconstruction from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:605-613.
[49]PARK J J,FLORENCE P,STRAUB J,et al.Deepsdf:Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:165-174.
[50]KONG D,WANG Q,QI Y.A diffusion-refinement model forsketch-to-point modeling[C]//Proceedings of the Asian Confe-rence on Computer Vision.2022:1522-1538.
[1] WANG Yuan, HUO Peng, HAN Yi, CHEN Tun, WANG Xiang, WEN Hui. Survey on Deep Learning-based Meteorological Forecasting Models [J]. Computer Science, 2025, 52(3): 112-126.
[2] SHEN Yaxin, GAO Lijian , MAO Qirong. Semi-supervised Sound Event Detection Based on Meta Learning [J]. Computer Science, 2025, 52(3): 222-230.
[3] HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276.
[4] SONG Xingnuo, WANG Congyan, CHEN Mingkai. Survey on 3D Scene Reconstruction Techniques in Metaverse [J]. Computer Science, 2025, 52(3): 17-32.
[5] WANG Tao, BAI Xuefei, WANG Wenjian. Selective Feature Fusion for 3D CT Image Segmentation of Renal Cancer Based on Edge Enhancement [J]. Computer Science, 2025, 52(3): 41-49.
[6] WANG Jie, WANG Chuangye, XIE Jiucheng, GAO Hao. Animatable Head Avatar Reconstruction Algorithm Based on Region Encoding [J]. Computer Science, 2025, 52(3): 50-57.
[7] WANG Xingbo, ZHANG Hao, GAO Hao, ZHAI Mingliang, XIE Jiucheng. Talking Portrait Synthesis Method Based on Regional Saliency and Spatial Feature Extraction [J]. Computer Science, 2025, 52(3): 58-67.
[8] CHENG Qinghua, JIAN Haifang, ZHENG Shuaikang, GUO Huimin, LI Yuehao. Illumination-aware Infrared/Visible Fusion for Object Detection [J]. Computer Science, 2025, 52(2): 173-182.
[9] LIU Yanlun, XIAO Zheng, NIE Zhenyu, LE Yuquan, LI Kenli. Case Element Association with Evidence Extraction for Adjudication Assistance [J]. Computer Science, 2025, 52(2): 222-230.
[10] SUN Rui, WANG Fei, FENG Huidong, ZHANG Xudong, GAO Jun. Research Progress in Facial Presentation Attack Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(2): 323-335.
[11] DING Ruiyang, SUN Lei, DAI Leyu, ZANG Weifei, XU Bayi. Generation Method for Adversarial Networks Traffic Based on Universal Perturbations [J]. Computer Science, 2025, 52(2): 336-343.
[12] CHEN Zigang, PAN Ding, LENG Tao, ZHU Haihua, CHEN Long, ZHOU Yousheng. Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing [J]. Computer Science, 2025, 52(2): 374-379.
[13] ZHANG Yusong, XU Shuai, YAN Xingyu, GUAN Donghai, XU Jianqiu. Survey on Cross-city Human Mobility Prediction [J]. Computer Science, 2025, 52(1): 102-119.
[14] LIU Yuming, DAI Yu, CHEN Gongping. Review of Federated Learning in Medical Image Processing [J]. Computer Science, 2025, 52(1): 183-193.
[15] LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!