基于注意力机制与对比损失的单视图草图三维重建

doi:10.11896/jsjkx.240200102

Abstract

Abstract: The metaverse is a three-dimensional(3D) virtual space that is immersive and interconnected.With the development of technologies such as virtual reality and artificial intelligence,the metaverse is reshaping human lifestyles.3D reconstruction is a core technique for the metaverse,and deep learning-based 3D reconstruction has become a popular research direction in computer vision.To address the problems of inevitable foreground and background ambiguity,drawing style variations,and viewpoint differences in hand-drawn sketches,a single-view sketch 3D reconstruction model based on attention mechanisms and contrastive losses without requiring additional annotations or user interactions is proposed.The model first rectifies the spatial layout of the input sketch using spatial transformers,and then uses the normalized attention module to establish long-distance and multi-level dependencies on the sketch.The global structure information of the sketch is used to alleviate the reconstruction difficulty caused by the ambiguity of the foreground and background.Furthermore,the contrastive loss function is designed to encourage the model to learn view-invariant and style-invariant latent space features of the sketches,so as to improve robustness.Experimental results on multiple datasets demonstrate the effectiveness and advancement of the proposed model.

Key words: Deep learning, Free-hand sketch, 3D reconstruction, Single view, Attention mechanism

CLC Number:

TP391.41

ZHONG Yue, GU Jieming. 3D Reconstruction of Single-view Sketches Based on Attention Mechanism and Contrastive Loss[J].Computer Science, 2025, 52(3): 77-85.

References

[1]NING H,WANG H,LIN Y,et al.A Survey on the Metaverse:The State-of-the-Art,Technologies,Applications,and Challenges[J].IEEE Internet of Things Journal,2023,10(16):14671-14688.
[2]CHEN X,ZOU D,XIE H,et al.Metaverse in Education:Contributors,Cooperations,and Research Themes[J].IEEE Tran-sactions on Learning Technologies,2023,16(6):1111-1129.
[3]NJOKU J N,NWAKANMA C I,AMAIZU G C,et al.Prospects and challenges of Metaverse application in data-driven intelligent transportation systems[J].IET Intelligent Transport Systems,2023,17(1):1-21.
[4]WANG G,BADAL A,JIA X,et al.Development of metaverse for intelligent healthcare[J].Nature Machine Intelligence,2022,4(11):922-929.
[5]WANG Y,SU Z,ZHANG N,et al.A survey on metaverse:Fundamentals,security,and privacy[J].IEEE Communications Surveys & Tutorials,2022,25(1):319-352.
[6]ANCIUKEVIČIUS T,XU Z,FISHER M,et al.Renderdiffusion:Image diffusion for 3d reconstruction,inpainting and gene-ration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12608-12618.
[7]LI J,GAO W,WU Y,et al.High-quality indoor scene 3D reconstruction with RGB-D cameras:A brief review[J].Computa-tional Visual Media,2022,8(3):369-393.
[8]GAO C,YU Q,SHENG L,et al.SketchSampler:Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling[C]//Proceedings of the European Conference on Computer Vision.2022:464-479.
[9]IGARASHI T,MATSUOKA S,TANAKA H.Teddy:a Sketching Interface for 3D Freeform Design[C]//Proceedings of the Conference on Conputer Graphics and Interactive Techniques.1999:409-416.
[10]BAE S H,BALAKRISHNAN R,SINGH K.ILoveSketch:as-natural-as-possible sketching system for creating 3d curve mo-dels[C]//Proceedings of the ACM Symposium on User Interface Software and Technology.2008:151-160.
[11]XU B,CHANG W,SHEFFER A,et al.True2Form:3D curve networks from 2D sketches via selective regularization[J].ACM Transactions on Graphics,2014,33(4):1-13.
[12]ZHU Z,YANG L,LIN X,et al.GARNet:Global-aware multi-view 3D reconstruction network and the cost-performance tradeoff[J].Pattern Recognition,2023,142:109674.
[13]WEN C,ZHANG Y,LI Z,et al.Pixel2mesh++:Multi-view 3d mesh generation via deformation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1042-1051.
[14]CHOY C B,XU D,GWAK J Y,et al.3d-r2n2:A unified approach for single and multi-view 3d object reconstruction[C]//Proceedings of the European Conference on Computer Vision.2016:628-644.
[15]WANG N,ZHANG Y,LI Z,et al.Pixel2mesh:Generating 3d mesh models from single rgb images[C]//Proceedings of the European Conference on Computer Vision.2018:52-67.
[16]WU J,ZHANG C,ZHANG X,et al.Learning shape priors for single-view 3d completion and reconstruction[C]//Proceedings of the European Conference on Computer Vision.2018:646-662.
[17]YAO Y,SCHERTLER N,ROSALES E,et al.Front2back:Single view 3d shape reconstruction via front to back prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:531-540.
[18]LIU R,WU R,VAN HOORICK B,et al.Zero-1-to-3:Zero-shot one image to 3d object[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:9298-9309.
[19]SHRESTHA R,FAN Z,SU Q,et al.Meshmvs:Multi-view ste-reo guided mesh reconstruction[C]//International Conference on 3D Vision.2021:1290-1300.
[20]ROSU R A,BEHNKE S.Permutosdf:Fast multi-view reconstruction with implicit surfaces using permutohedral lattices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:8466-8475.
[21]LONG X,LIN C,LIU L,et al.Neuraludf:Learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20834-20843.
[22]HAN X,GAO C,YU Y.DeepSketch2Face:a deep learningbased sketching system for 3D face and caricature modeling[J].ACM Transactions on Graphics,2017,36(4):1-12.
[23]ZHANG S H,GUO Y C,GU Q W.Sketch2model:View-aware 3d modeling from single free-hand sketches[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6012-6021.
[24]WANG J,LIN J,YU Q,et al.3D Shape Reconstruction fromFree-hand Sketches[C]//Proceedings of the European Confe-rence on Computer Vision.2022:184-202.
[25]CHEN T,FU C,ZANG Y,et al.Deep3DSketch+:Rapid 3D Modeling from Single Free-Hand Sketches[C]//International Conference on Multimedia Modeling.2023:16-28.
[26]LUN Z,GADELHA M,KALOGERAKIS E,et al.3d shape reconstruction from sketches via multi-view convolutional networks[C]//International Conference on 3D Vision.2017:67-77.
[27]LI C,PAN H,LIU Y,et al.Robust flow-guided neural prediction for sketch-based freeform surface modeling[J].ACM Transactions on Graphics,2018,37(6):1-12.
[28]DELANOY J,AUBRY M,ISOLA P,et al.3d sketching using multi-view deep volumetric prediction[J].Proceedings of the ACM on Computer Graphics and Interactive Techniques,2018,1(1):1-22.
[29]ZHOU J,LUO Z,YU Q,et al.GA-Sketching:Shape Modeling from Multi-View Sketching with Geometry-Aligned Deep Implicit Functions[J].Computer Graphics Forum,2023,42(7):e14948.
[30]XIE H,YAO H,ZHANG S,et al.Pix2Vox++:Multi-scale context-aware 3D object reconstruction from single and multiple images[J].International Journal of Computer Vision,2020,128(12):2919-2935.
[31]YANG B,WANG S,MARKHAM A,et al.Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction[J].International Journal of Computer Vision,2020,128(1):53-73.
[32]NEALEN A,SORKINE O,ALEXA M,et al.A Sketch-based Interface for Detail-preserving Mesh Editing[J].ACM Transactions on Graphics,2005,3(24):1142-1147.
[33]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[C]//Proceedings of the Conference on Neural Information Processing Systems.2015:2017-2025.
[34]NEWMAN T S,YI H.A survey of the marching cubes algorithm[J].Computers & Graphics,2006,30(5):854-879.
[35]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[36]LIU Y,SHAO Z,TENG Y,et al.NAM:Normalization-basedattention module[J].arXiv:2111.12419,2021.
[37]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision.2018:3-19.
[38]ZHANG H,GOODFELLOW I,METAXAS D,et al.Self-attention generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning.2019:7354-7363.
[39]FU J,LIU J,TIAN H,et al.Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3146-3154.
[40]SUN Y,WANG Y,LIU Z,et al.Pointgrow:Autoregressivelylearned point cloud generation with self-attention[C]//Procee-dings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2020:61-70.
[41]MISRA D,NALAMADA T,ARASANIPALAI A U,et al.Rotate to attend:Convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:3139-3148.
[42]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning.2015:448-456.
[43]ZHONG Y,GRYADITSKAYA Y,ZHANG H,et al.Deepsketch-based modeling:Tips and tricks[C]//International Conference on 3D Vision.2020:543-552.
[44]ZHONG Y,QI Y,GRYADITSKAYA Y,et al.Towards practical sketch-based 3d shape generation:The role of professional sketches[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(9):3518-3528.
[45]WAILLY B,BOUSSEAU A.Line rendering of 3d meshes fordata-driven sketch-based modeling[C]//Journées Francaises d′Informatique Graphique et de Réalité virtuelle.2019.
[46]PAN J,HAN X,CHEN W,et al.Deep mesh reconstruction from single rgb images via topology modification networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:9964-9973.
[47]MESCHEDER L,OECHSLE M,NIEMEYER M,et al.Occu-pancy networks:Learning 3d reconstruction in function space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:4460-4470.
[48]FAN H,SU H,GUIBAS L J.A point set generation network for 3d object reconstruction from a single image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:605-613.
[49]PARK J J,FLORENCE P,STRAUB J,et al.Deepsdf:Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:165-174.
[50]KONG D,WANG Q,QI Y.A diffusion-refinement model forsketch-to-point modeling[C]//Proceedings of the Asian Confe-rence on Computer Vision.2022:1522-1538.

Related Articles 15

[1]	HUANG Miaomiao, WANG Huiying, WANG Meixia, WANG Yejiang , ZHAO Yuhai. Review of Graph Embedding Learning Research:From Simple Graph to Complex Graph [J]. Computer Science, 2026, 53(1): 58-76.
[2]	WANG Cheng, JIN Cheng. KAN-based Unsupervised Multivariate Time Series Anomaly Detection Network [J]. Computer Science, 2026, 53(1): 89-96.
[3]	XUE Jingyan, XIA Jianan, HUO Ruili, LIU Jie, ZHOU Xuezhong. Review of Retinal Image Analysis Methods for OCT/OCTA Based on Deep Learning [J]. Computer Science, 2026, 53(1): 128-140.
[4]	ZHOU Bingquan, JIANG Jie, CHEN Jiangmin, ZHAN Lixin. EvR-DETR:Event-RGB Fusion for Lightweight End-to-End Object Detection [J]. Computer Science, 2026, 53(1): 153-162.
[5]	LYU Jinggang, GAO Shuo, LI Yuzhi, ZHOU Jin. Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation [J]. Computer Science, 2026, 53(1): 195-205.
[6]	FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
[7]	WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240.
[8]	CHEN Qian, CHENG Kaixuan, GUO Xin, ZHANG Xiaoxia, WANG Suge, LI Yanhong. Bidirectional Prompt-Tuning for Event Argument Extraction with Topic and Entity Embeddings [J]. Computer Science, 2026, 53(1): 278-284.
[9]	LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268.
[10]	PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281.
[11]	GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319.
[12]	YIN Shi, SHI Zhenyang, WU Menglin, CAI Jinyan, YU De. Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges [J]. Computer Science, 2025, 52(9): 16-24.
[13]	ZENG Lili, XIA Jianan, LI Shaowen, JING Maike, ZHAO Huihui, ZHOU Xuezhong. M2T-Net:Cross-task Transfer Learning Tongue Diagnosis Method Based on Multi-source Data [J]. Computer Science, 2025, 52(9): 47-53.
[14]	LI Yaru, WANG Qianqian, CHE Chao, ZHU Deheng. Graph-based Compound-Protein Interaction Prediction with Drug Substructures and Protein 3D Information [J]. Computer Science, 2025, 52(9): 71-79.
[15]	LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

3D Reconstruction of Single-view Sketches Based on Attention Mechanism and Contrastive Loss

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0