计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 207-213.doi: 10.11896/j.issn.1002-137X.2018.07.036
李香元,蔡骋,何进荣
LI Xiang-yuan, CAI Cheng, HE Jin-rong
摘要: 等度量映射(ISOMAP)算法是一种被广泛应用的非线性无监督降维算法,通过保持各个观测样本间的测地距离进行等距嵌入,从而实现高维空间向低维空间的坐标转换。但在实际应用中,观测数据无可避免地会存在噪声,由于测地距离的计算对噪声比较敏感,并且也没有考虑数据集的密度分布,导致ISOMAP算法降维后低维坐标表示存在几何变形。针对这一缺点,根据局部密度的思想,提出一种基于密度缩放因子的ISOMAP(Density Scaling Factor Based ISOMAP,D-ISOMAP)算法。在传统的ISOMAP算法框架下,首先,针对每个观测样本计算一个局部密度缩放因子;然后,在测地距离的计算过程中,将直接相邻的两个样本之间的测地距离除以这两个样本密度缩放因子的乘积;最后,通过最短路径算法求得改进后的距离矩阵,并对其进行降维处理。改进的测地距离在密度较大的区域被缩小,而在密度较小的区域被放大,这样可以减小噪声对降维效果的影响,提升可视化和聚类效果。人工数据集和UCI数据集上的实验结果表明,在数据集的可视化和聚类效果方面,D-ISOMAP算法较经典的无监督降维算法具有一定的优势。
中图分类号:
[1]LEW M S,SEBE N,DJERABA C,et al.Content-based multimedia information retrieval:State of the art and challenges[J].ACM Transactions on Multimedia Computing,Communications,and Applications (TOMM),2006,2(1):1-19. [2]ZHANG T,TANG Y Y,FANG B,et al.Document Clustering in Correlation Similarity Measure Space[J].IEEE Transactions on Knowledge & Data Engineering,2012,24(6):1002-1013. [3]BAR-JOSEPH Z,GITTER A,SIMON I.Studying and modelling dynamic biological processes using time-series gene expression data[J].Nature Reviews Genetics,2012,13(8):552. [4]DONOHO D L.High-Dimensional Data Analysis:The Cursesand Blessings of Dimensionality[C]∥Lecture — Math Challenges of Century.2000:178-183. [5]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326. [6]H RDLE W K,HL VKA Z.Principal Component Analysis[J].IEEE Trans on Automatic Control,2015,29(1):163-183. [7]AMARI S I,CICHOCKI A.A new learning algorithm for blind source separation[C]∥Advances in Neural Information Processing Systems.1996:757-763. [8]MARTINEZ A M,KAK A C.PCA versus LDA[J].IEEETransactions on Pattern Analysis & Machine Intelligence,2001,23(2):228-233. [9]COX M AA,COX T F.Multidimensional Scaling[J].Journal of the Royal Statistical Society,2008,46(2):1050-1057. [10]TENENBAUM J B,SILVA V D,LANGFORD J C.A Global Geometric Framework for Nonlinear Dimensionality Reduction[J].Science,2000,290(5500):2319-2323. [11]YANG J,FRANGI A F,YANG J,et al.KPCA plus LDA:acomplete kernel Fisher discriminant framework for feature extraction and recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(2):230-244. [12]YU J,KIM S B.Density-based geodesic distance for identifying the noisy and nonlinear clusters[J].Information Sciences,2016,360:231-243. [13]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496. [14]GENG X,ZHAN D C,ZHOU Z H.Supervised nonlinear dimensionality reduction for visualization and classification[J].IEEE Transactions on Systems Man & Cybernetics Part B Cyberne-tics,2005,35(6):1098-1107. [15]ZHANG Z,CHOW T W S,ZHAO M.M-Isomap:Orthogonalconstrained marginal isomap for nonlinear dimensionality reduction[J].IEEE Transactions on Cybernetics,2013,43(1):180-191. [16]YANG B,XIANG M,ZHANG Y.Multi-manifold Discriminant Isomap for visualization and classification[J].Pattern Recognition,2016,55:215-230. [17]HE J R,DING L X,LI Z K,et al.Margin Discriminant Projection for Dimensionality Reduction[J].Journal of Software,2014,25(4):826-838.(in Chinese) 何进荣,丁立新,李照奎,等.基于边界判别投影的数据降维[J].软件学报,2014,25(4):826-838. [18]BERNSTEIN M,SILVA V D,LANGFORD J C,et al.Graph Approximations to Geodesics on Embedded Manifolds[J].Stanford University,2001,24(9):153-158. [19]MAIER M,LUXBURG U V,HEIN M.Influence of graph construction on graph-based clustering[C]∥Conference on Neural Information Processing Systems.Vancouver,British Columbia,Canada,DBLP,2008:1025-1032. [20]FU Y,MA Y.Graph Embedding for Pattern Analysis[M].New York:Springer,2013. [21]LEE J S,OLAFSSON S.Data clustering by minimizing disconnectivity[J].Information Sciences,2011,181(4):732-746. |
[1] | 杨辉, 陶力宏, 朱建勇, 聂飞平. 基于锚点的快速无监督图嵌入 Fast Unsupervised Graph Embedding Based on Anchors 计算机科学, 2022, 49(4): 116-123. https://doi.org/10.11896/jsjkx.210200098 |
[2] | 赵亮, 张洁, 陈志奎. 基于双图正则化的自适应多模态鲁棒特征学习 Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization 计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078 |
[3] | 陈长伟, 周晓峰. 快速局部协同表示分类器及其在人脸识别中的应用 Fast Local Collaborative Representation Based Classifier and Its Applications in Face Recognition 计算机科学, 2021, 48(9): 208-215. https://doi.org/10.11896/jsjkx.200800155 |
[4] | 赵志强, 易秀双, 李婕, 王兴伟. 基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究 Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm 计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001 |
[5] | 白子轶, 毛懿荣, 王瑞平. 视频人脸识别进展综述 Survey on Video-based Face Recognition 计算机科学, 2021, 48(3): 50-59. https://doi.org/10.11896/jsjkx.210100210 |
[6] | 刘嘉琛, 秦小麟, 朱润泽. 基于LSTM-Attention的RFID移动对象位置预测 Prediction of RFID Mobile Object Location Based on LSTM-Attention 计算机科学, 2021, 48(3): 188-195. https://doi.org/10.11896/jsjkx.200600134 |
[7] | 邹承明, 陈德. 高维大数据分析的无监督异常检测方法 Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis 计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141 |
[8] | 杨章静, 王文博, 黄璞, 张凡龙, 王昕. 基于局部加权表示的线性回归分类器及人脸识别 Local Weighted Representation Based Linear Regression Classifier and Face Recognition 计算机科学, 2021, 48(11A): 351-359. https://doi.org/10.11896/jsjkx.210100173 |
[9] | 张俊, 王杨, 李坤豪, 李昌, 赵传信. 基于流形学习的多源传感器体域网数据融合模型 Multi-source Sensor Body Area Network Data Fusion Model Based on Manifold Learning 计算机科学, 2020, 47(8): 323-328. https://doi.org/10.11896/jsjkx.191000012 |
[10] | 杨力, 李欣宇, 石怀峰, 潘成胜. 空间信息网络任务智能识别方法 Task Intelligent Identification Method for Spatial Information Network 计算机科学, 2020, 47(4): 262-269. https://doi.org/10.11896/jsjkx.190300111 |
[11] | 贾旭, 孙福明, 李豪杰, 曹玉东. 基于有监督双正则NMF的静脉识别算法 Vein Recognition Algorithm Based on Supervised NMF with Two Regularization Terms 计算机科学, 2018, 45(8): 283-287. https://doi.org/10.11896/j.issn.1002-137X.2018.08.051 |
[12] | 黄铉. 特征降维技术的研究与进展 Research and Development of Feature Dimensionality Reduction 计算机科学, 2018, 45(6A): 16-21. |
[13] | 张志禹, 刘思媛. 一种基于Curv-SAE特征融合的人脸降维和识别方法 Method of Face Recognition and Dimension Reduction Based on Curv-SAE Feature Fusion 计算机科学, 2018, 45(10): 267-271. https://doi.org/10.11896/j.issn.1002-137X.2018.10.049 |
[14] | 张绍群. 基于紧集子覆盖的流形学习算法 Manifold Learning Algorithm Based on Compact Setsub-coverage 计算机科学, 2017, 44(Z6): 88-91. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.018 |
[15] | 叶军,金忠. 基于对偶超图正则化的概念分解算法及其在数据表示中的应用 Hypergraph Dual Regularization Concept Factorization Algorithm and Its Application in Data Representation 计算机科学, 2017, 44(7): 309-314. https://doi.org/10.11896/j.issn.1002-137X.2017.07.056 |
|