Computer Science ›› 2026, Vol. 53 ›› Issue (5): 109-118.doi: 10.11896/jsjkx.250400084

• Database & Big Data & Data Science • Previous Articles     Next Articles

Review of Uniform Manifold Approximation and Projection

ZHANG Run, LI Xiaobin, XU Yamin   

  1. School of Mathematics, Southwest Jiaotong University, Chengdu 611756, China
  • Received:2025-04-17 Revised:2025-06-18 Published:2026-05-08
  • About author:LI Xiaobin,born in 1983,Ph.D,master’s supervisor,is a member of CCF(No.Z1644M).His main research interests include geometric topology and its applications.
  • Supported by:
    Fundamental Research Funds for the Central Universities(2682021ZTPY043,2682025ZTPY001) and National Natural Science Foundation of China(11501470,11426187).

Abstract: This paper systematically introduces the theoretical foundations,algorithmic implementations,and recent developments of UMAP algorithm.UMAP is a nonlinear dimensionality reduction method grounded in algebraic topology and category theory.It aims to preserve the local structure of high-dimensional data while faithfully capturing its global distribution.The algorithm can be divided into two stages.Firstly,it constructs a weighted k-nearest neighbor graph that encodes local geometric relationships,with neighborhood sizes adaptively determined by data density.Secondly,it learns a low-dimensional embedding by minimizing the cross-entropy between the high-dimensional and low-dimensional neighborhood graphs,thereby preserving topological structure and enabling effective visualization.This survey further reviews several notable extensions of UMAP.1) Supervised and Semi-supervised UMAP enhances class separation by incorporating label information;2) Parameterized UMAP integrates neural networks to achieve generalizable nonlinear mappings;3) DensMAP preserves data distribution characteristics through density correlation optimization;4) AlignedUMAP enables aligned embeddings across datasets or temporal sequences;5) Progressive UMAP addresses dynamic embedding challenges for streaming data and out-of-sample extensions;6)GNUMAP,which combines with graph neural networks;7)MultiMAP,which is applied to multimodal data fusion.

Key words: UMAP, Supervised learning, Semi-supervised learning, Parameterized UMAP, DensMAP, AlignedUMAP, Progressive UMAP, GNUMAP, MultiMAP

CLC Number: 

  • TP311
[1]VAN DER MAATEN L,POSTMA E O,VAN DEN HERIK H J.Dimensionality reduction:A comparative review[J].Journal of Machine Learning Research,2009,10(66-71):13.
[2]SORZANO C O S,VARGAS J,MONTANO A P.A survey of dimensionality reduction techniques[J].arXiv:1403.2877,2014.
[3]SAUL L K,WEINBERGER K Q,SHA F,et al.Spectral methods for dimensionality reduction[J].Semi-supervised learning,2006,3:566-806.
[4]GHOJOGH B,GHODSI A,KARRAY F,et al.Laplacian-based dimensionality reduction including spectral clustering,Laplacian eigenmap,locality preserving projection,graph embedding,and diffusion map:Tutorial and survey[J].arXiv:2106.02154,2021.
[5]TELI M N.Dimensionality reduction using neural networks[J].Intelligent Engineering Systems Through Artificial Neural Networks,2007,17.
[6]MIGENDA N,MöLLER R,SCHENCK W.Adaptive dimensionality reduction for neural network-based online principal component analysis[J].PloS One,2021,16(3):e0248896.
[7]VAN DER MAATEN L,HINTON G.Visualizing data usingt-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605.
[8]TANG J,LIU J,ZHANG M,et al.Visualizing large-scale and high-dimensional data[C]//Proceedings of the 25th International Conference on World Wide Web.2016:287-297.
[9]MCINNES L,HEALY J,MELVILLE J.Umap:Uniform manifold approximation and projection for dimension reduction[J].arXiv:1802.03426,2018.
[10]WANG Y,HUANG H,RUDIN C,et al.Understanding how dimension reduction tools work:an empirical approach to deciphering t-SNE,UMAP,TriMAP,and PaCMAP for data visualization[J].Journal of Machine Learning Research,2021,22(201):1-73.
[11]KOBAK D,LINDERMAN G C.Initialization is critical for preserving global data structure in both t-SNE and UMAP[J].Nature Biotechnology,2021,39(2):156-157.
[12]BECHT E,MCINNES L,HEALY J,et al.Dimensionality re-duction for visualizing single-cell data using UMAP[J].Nature Biotechnology,2019,37(1):38-44.
[13]FRIEDMAN G.Survey article:an elementary illustrated introduction to simplicial sets[J].The Rocky Mountain Journal of Mathematics,2012,42(2):353-423.
[14]MAY J P.Simplicial objects in algebraic topology[M].Chicago:University of Chicago Press,1992.
[15]MAC LANE S.Categories for the working mathematician[M].Springer Science & Business Media,1998.
[16]RIEHL E.Category theory in context[M].Courier Dover Publications,2017.
[17]SPIVAK D I.Metric realization of fuzzy simplicial sets[EB/OL].(2009-04)[2025-04-16].https://dspivak.net/metric_realization090922.pdf.
[18]GUO G,WANG H,BELL D,et al.KNN model-based approach in classification[C]//On The Move to Meaningful Internet Systems 2003:CoopIS,DOA,and ODBASE:OTM Confederated International Conferences,CoopIS,DOA,and ODBASE 2003.Springer Berlin Heidelberg,2003:986-996.
[19]BÖHM J N,BERENS P,KOBAK D.Attraction-repulsion spectrum in neighbor embeddings[J].Journal of Machine Learning Research,2022,23(95):1-32.
[20]HASTIE T,TIBSHIRANI R,FRIEDMAN J.Overview of su-pervised learning[M]//The elements of statistical learning:Data mining,inference,and prediction(2nd ed).New York:Springer,2009:9-41.
[21]CUNNINGHAM P,CORD M,DELANY S J.Supervised learning[M]//Machine Learning Techniques for Multimedia:case studies on organization and retrieval.Berlin:Springer,2008:21-49.
[22]ZHU X J.Semi-supervised learning literature survey[J].Computer Sciences TR,2008,1530:1-60.
[23]VAN ENGELEN J E,HOOS H H.A survey on semi-supervised learning[J].Machine Learning,2020,109(2):373-440.
[24]KAYA M,BILGE H Ş.Deep metric learning:A survey[J].Symmetry,2019,11(9):1066-1067.
[25]KULIS B.Metric learning:A survey[J].Foundations andTrends© in Machine Learning,2013,5(4):287-364.
[26]SAINBURG T,MCINNES L,GENTNER T Q.ParametricUMAP embeddings for representation and semisupervised learning[J].Neural Computation,2021,33(11):2881-2907.
[27]GISBRECHT A,SCHULZ A,HAMMER B.Parametric nonlinear dimensionality reduction using kernel t-SNE[J].Neurocomputing,2015,147:71-82.
[28]NARAYAN A,BERGER B,CHO H.Density-preserving datavisualization unveils dynamic patterns of single-cell transcriptomic variability[J].bioRxiv,2020:2020.05.12.077776.
[29]KIRCHGäSSNER G,WOLTERS J,HASSLER U.Introduction to modern time series analysis[M].Springer Science & Business Media,2012.
[30]DADU A,SATONE V K,KAUR R,et al.Application ofAligned-UMAP to longitudinal biomedical studies[J].Patterns,2023,4(6):1-10.
[31]KOLAJO T,DARAMOLA O,ADEBIYI A.Big data streamanalysis:a systematic literature review[J].Journal of Big Data,2019,6(1):46-47.
[32]KO H K,JO J,SEO J.Progressive Uniform Manifold Approximation and Projection[C]//EuroVis(Short Papers).2020:133-137.
[33]SZE V,CHEN Y H,YANG T J,et al.Efficient processing of deep neural networks:A tutorial and survey[J].Proceedings of the IEEE,2017,105(12):2295-2329.
[34]CHAVOOSHI M,MAMONOV A V.Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning[J].arXiv:2501.07729,2025.
[35]LOVRIĆ M, ĐURIČIĆ T,TRAN H T N,et al.Should we embedin chemistry? A comparison of unsupervised transfer learning with PCA,UMAP,and VAE on molecular fingerprints[J].Pharmaceuticals,2021,14(8):757-758.
[36]ROSS A.Procrustes analysis[J].Course report,Department of Computer Science and Engineering,University of South Carolina,2004,26:1-8.
[37]BRILLINGER D R.Time series:data analysis and theory[M].Society for Industrial and Applied Mathematics,2001.
[38]JO J,SEO J,FEKETE J D.Panene:A progressive algorithm for indexing and querying approximate k-nearest neighbors[J].IEEE Transactions on Visualization and Computer Graphics,2018,26(2):1347-1360.
[39]MUJA M,LOWE D G.Fast approximate nearest neighbors with automatic algorithm configuration[C]//International Conference on Computer Vision Theory and Applications(VISAPP).2009:331-340.
[40]LUISIER F,BLU T,UNSER M.Image denoising in mixed Poisson-Gaussian noise[J].IEEE Transactions on Image Processing,2010,20(3):696-708.
[41]WU Z,PAN S,CHEN F,et al.A comprehensive survey ongraph neural networks[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(1):4-24.
[42]WANG J,MA A,CHANG Y,et al.scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses[J].Nature Communications,2021,12(1):1882-1883.
[43]LAZAROS K,KOUMADORAKIS D E,VLAMOS P,et al.Graph neural network approaches for single-cell data:a recent overview[J].Neural Computing and Applications,2024,36(17):9963-9987.
[44]YOU J,JEONG S W,DONNAT C.Gnumap:A parameter-free approach to unsupervised dimensionality reduction via graph neural networks[J].arXiv:2407.21236,2024.
[45]NASKATH J,SIVAKAMASUNDARI G,BEGUM A A S.A study on different deep learning algorithms used in deep neural nets:MLP SOM and DBN[J].Wireless Personal Communications,2023,128(4):2913-2936.
[46]LI F,ZHANG R,ZHANG H,et al.Llava-next-interleave:Tackling multi-image,video,and 3d in large multimodal models[J].arXiv:2407.07895,2024.
[47]JAIN M S,POLANSKI K,CONDE C D,et al.MultiMAP:dimensionality reduction and integration of multimodal data[J].Genome Biology,2021,22:1-26.
[48]ALPAYDIN E,ALIMOGLU F.Pen-based recognition of handwritten digits data set[DB/OL].University of California Irvine(1998-02)[2025-04-16].https://doi.org/10.24432/C5MG6K.
[1] HUANG Siyang, YAO Ye, ZHU Yian, HAI Duo, XIONG Zhihai. Anomaly Detection and Localization Technology for Gravity Wave Spectral Images Based onPre-trained Networks [J]. Computer Science, 2026, 53(5): 193-206.
[2] XU Yamin, LI Xiaobin, ZHANG Run. Semi-supervised Learning Algorithm Based on Pointwise Manifold Structures and Uniform Regularity Constraints [J]. Computer Science, 2026, 53(4): 173-179.
[3] GE Zeqing, HUANG Shengjun. Semi-supervised Learning Method for Multi-label Tabular Data [J]. Computer Science, 2026, 53(3): 151-157.
[4] WANG Yiming, JIAO Min, ZHAO Suyun, CHEN Hong, LI Cuiping. Prompt-conditioned Representation Learning with Diffusion Models for Semi-supervised Clustering [J]. Computer Science, 2026, 53(3): 158-165.
[5] CHEN Xiaolan, MAO Shun, LI Weisheng, LIN Ronghua, TANG Yong. Robust Knowledge Tracing Model Based on Two-level Contrastive Learning [J]. Computer Science, 2026, 53(2): 31-38.
[6] WANG Xinyu, SONG Xiaomin, ZHENG Huiming, PENG Dezhong, CHEN Jie. Contrastive Learning-based Masked Graph Autoencoder [J]. Computer Science, 2026, 53(2): 145-151.
[7] JIANG Rui, FAN Shuwen, WANG Xiaoming, XU Youyun. Clustering Algorithm Based on Improved SOM Model [J]. Computer Science, 2025, 52(8): 162-170.
[8] DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer [J]. Computer Science, 2025, 52(8): 188-194.
[9] WANG Yicheng, NING Tai, LIU Xinyu, LUO Ye. Position-aware Based Multi-modality Lung Cancer Survival Prediction Method [J]. Computer Science, 2025, 52(6A): 240500089-8.
[10] CHEN Qirui, WANG Baohui, DAI Chencheng. Research on Electrocardiogram Classification and Recognition Algorithm Based on Transfer Learning [J]. Computer Science, 2025, 52(6A): 240900073-8.
[11] DU Yuanhua, CHEN Pan, ZHOU Nan, SHI Kaibo, CHEN Eryang, ZHANG Yuanpeng. Correntropy Based Multi-view Low-rank Matrix Factorization and Constraint Graph Learning for Multi-view Data Clustering [J]. Computer Science, 2025, 52(6A): 240900131-10.
[12] BAO Shenghong, YAO Youjian, LI Xiaoya, CHEN Wen. Integrated PU Learning Method PUEVD and Its Application in Software Source CodeVulnerability Detection [J]. Computer Science, 2025, 52(6A): 241100144-9.
[13] ZHANG Hang, WEI Shoulin, YIN Jibin. TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240900126-7.
[14] WANG Xiao, LI Guanxiong, LI Na, YUAN Dongfeng. Semi-supervised Learning Flow Field Prediction Method Based on Gaussian Mixture Discrimination [J]. Computer Science, 2025, 52(6): 88-95.
[15] ZHANG Jiaxiang, PAN Min, ZHANG Rui. Study on EEG Emotion Recognition Method Based on Self-supervised Graph Network [J]. Computer Science, 2025, 52(5): 122-127.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!