计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 207-213.doi: 10.11896/j.issn.1002-137X.2018.07.036

• 人工智能 • 上一篇    下一篇

基于密度缩放因子的ISOMAP算法

李香元,蔡骋,何进荣   

  1. 西北农林科技大学信息工程学院 陕西 杨凌712100
  • 收稿日期:2017-04-19 出版日期:2018-07-30 发布日期:2018-07-30
  • 作者简介:李香元(1993-),男,硕士生,主要研究领域为机器学习;蔡 骋(1979-),男,博士,教授,硕士生导师,CCF会员,主要研究领域为计算机视觉与机器学习,E-mail:chengcai@nwsuaf.edu.cn(通信作者);何进荣(1984-),男,博士,讲师,CCF会员,主要研究领域为计算机视觉与数据降维。
  • 基金资助:
    本文受国家自然科学基金项目(61202188),西北农林科技大学博士科研启动基金项目(2452015302),杨凌示范区科技计划项目(2016NY-31)资助。

Density Scaling Factor Based ISOMAP Algorithm

LI Xiang-yuan, CAI Cheng, HE Jin-rong   

  1. College of Information Engineering,Northwest A & F University,Yangling,Shaanxi 712100,China
  • Received:2017-04-19 Online:2018-07-30 Published:2018-07-30

摘要: 等度量映射(ISOMAP)算法是一种被广泛应用的非线性无监督降维算法,通过保持各个观测样本间的测地距离进行等距嵌入,从而实现高维空间向低维空间的坐标转换。但在实际应用中,观测数据无可避免地会存在噪声,由于测地距离的计算对噪声比较敏感,并且也没有考虑数据集的密度分布,导致ISOMAP算法降维后低维坐标表示存在几何变形。针对这一缺点,根据局部密度的思想,提出一种基于密度缩放因子的ISOMAP(Density Scaling Factor Based ISOMAP,D-ISOMAP)算法。在传统的ISOMAP算法框架下,首先,针对每个观测样本计算一个局部密度缩放因子;然后,在测地距离的计算过程中,将直接相邻的两个样本之间的测地距离除以这两个样本密度缩放因子的乘积;最后,通过最短路径算法求得改进后的距离矩阵,并对其进行降维处理。改进的测地距离在密度较大的区域被缩小,而在密度较小的区域被放大,这样可以减小噪声对降维效果的影响,提升可视化和聚类效果。人工数据集和UCI数据集上的实验结果表明,在数据集的可视化和聚类效果方面,D-ISOMAP算法较经典的无监督降维算法具有一定的优势。

关键词: ISOMAP, 降维, 流形学习, 密度缩放因子, 噪声数据

Abstract: ISOMAP is a widely used nonlinear unsupervised dimensionality reduction algorithm,and preserves the coor-dinate transformation from high-dimensional space to low-dimensional space by keeping the geodesic distance between the observation samples.However,practical data are inevitably corrupted by noise.Since the calculation of geodesic distance is sensitive to noise and does not consider the density distribution of dataset,the geometric deformation of the ISOMAP algorithm is happening after dimensionality reduction.For this shortcoming,according to the idea of local density,a density scaling factor based ISOMAP algorithm called D-ISOMAP was proposed.In the framework of traditional ISOMAP algorithm,local density scaling factor is first calculated for each observation sample.And then the original geo-desic distance of two samples is divided by the product of their density scaling factors.Finally,the improved distance matrix is obtained by the shortest path algorithm.Since the improved geodesic distance is reduced in the larger density region and enlarged in the smaller density region,it can achieve a good effect in noisy situations.The experimental results on artificial dataset and UCI dataset show that the D-ISOMAP algorithm is better than the classical unsupervised dimensionality reduction algorithms in terms of visualization and clustering of data sets.

Key words: Density scaling factor, Dimensionality reduction, ISOMAP, Manifold learning, Noise data

中图分类号: 

  • TP181
[1]LEW M S,SEBE N,DJERABA C,et al.Content-based multimedia information retrieval:State of the art and challenges[J].ACM Transactions on Multimedia Computing,Communications,and Applications (TOMM),2006,2(1):1-19.
[2]ZHANG T,TANG Y Y,FANG B,et al.Document Clustering in Correlation Similarity Measure Space[J].IEEE Transactions on Knowledge & Data Engineering,2012,24(6):1002-1013.
[3]BAR-JOSEPH Z,GITTER A,SIMON I.Studying and modelling dynamic biological processes using time-series gene expression data[J].Nature Reviews Genetics,2012,13(8):552.
[4]DONOHO D L.High-Dimensional Data Analysis:The Cursesand Blessings of Dimensionality[C]∥Lecture — Math Challenges of Century.2000:178-183.
[5]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[6]H RDLE W K,HL VKA Z.Principal Component Analysis[J].IEEE Trans on Automatic Control,2015,29(1):163-183.
[7]AMARI S I,CICHOCKI A.A new learning algorithm for blind source separation[C]∥Advances in Neural Information Processing Systems.1996:757-763.
[8]MARTINEZ A M,KAK A C.PCA versus LDA[J].IEEETransactions on Pattern Analysis & Machine Intelligence,2001,23(2):228-233.
[9]COX M AA,COX T F.Multidimensional Scaling[J].Journal of the Royal Statistical Society,2008,46(2):1050-1057.
[10]TENENBAUM J B,SILVA V D,LANGFORD J C.A Global Geometric Framework for Nonlinear Dimensionality Reduction[J].Science,2000,290(5500):2319-2323.
[11]YANG J,FRANGI A F,YANG J,et al.KPCA plus LDA:acomplete kernel Fisher discriminant framework for feature extraction and recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(2):230-244.
[12]YU J,KIM S B.Density-based geodesic distance for identifying the noisy and nonlinear clusters[J].Information Sciences,2016,360:231-243.
[13]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[14]GENG X,ZHAN D C,ZHOU Z H.Supervised nonlinear dimensionality reduction for visualization and classification[J].IEEE Transactions on Systems Man & Cybernetics Part B Cyberne-tics,2005,35(6):1098-1107.
[15]ZHANG Z,CHOW T W S,ZHAO M.M-Isomap:Orthogonalconstrained marginal isomap for nonlinear dimensionality reduction[J].IEEE Transactions on Cybernetics,2013,43(1):180-191.
[16]YANG B,XIANG M,ZHANG Y.Multi-manifold Discriminant Isomap for visualization and classification[J].Pattern Recognition,2016,55:215-230.
[17]HE J R,DING L X,LI Z K,et al.Margin Discriminant Projection for Dimensionality Reduction[J].Journal of Software,2014,25(4):826-838.(in Chinese)
何进荣,丁立新,李照奎,等.基于边界判别投影的数据降维[J].软件学报,2014,25(4):826-838.
[18]BERNSTEIN M,SILVA V D,LANGFORD J C,et al.Graph Approximations to Geodesics on Embedded Manifolds[J].Stanford University,2001,24(9):153-158.
[19]MAIER M,LUXBURG U V,HEIN M.Influence of graph construction on graph-based clustering[C]∥Conference on Neural Information Processing Systems.Vancouver,British Columbia,Canada,DBLP,2008:1025-1032.
[20]FU Y,MA Y.Graph Embedding for Pattern Analysis[M].New York:Springer,2013.
[21]LEE J S,OLAFSSON S.Data clustering by minimizing disconnectivity[J].Information Sciences,2011,181(4):732-746.
[1] 杨辉, 陶力宏, 朱建勇, 聂飞平.
基于锚点的快速无监督图嵌入
Fast Unsupervised Graph Embedding Based on Anchors
计算机科学, 2022, 49(4): 116-123. https://doi.org/10.11896/jsjkx.210200098
[2] 赵亮, 张洁, 陈志奎.
基于双图正则化的自适应多模态鲁棒特征学习
Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization
计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078
[3] 陈长伟, 周晓峰.
快速局部协同表示分类器及其在人脸识别中的应用
Fast Local Collaborative Representation Based Classifier and Its Applications in Face Recognition
计算机科学, 2021, 48(9): 208-215. https://doi.org/10.11896/jsjkx.200800155
[4] 赵志强, 易秀双, 李婕, 王兴伟.
基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究
Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm
计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001
[5] 白子轶, 毛懿荣, 王瑞平.
视频人脸识别进展综述
Survey on Video-based Face Recognition
计算机科学, 2021, 48(3): 50-59. https://doi.org/10.11896/jsjkx.210100210
[6] 刘嘉琛, 秦小麟, 朱润泽.
基于LSTM-Attention的RFID移动对象位置预测
Prediction of RFID Mobile Object Location Based on LSTM-Attention
计算机科学, 2021, 48(3): 188-195. https://doi.org/10.11896/jsjkx.200600134
[7] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[8] 杨章静, 王文博, 黄璞, 张凡龙, 王昕.
基于局部加权表示的线性回归分类器及人脸识别
Local Weighted Representation Based Linear Regression Classifier and Face Recognition
计算机科学, 2021, 48(11A): 351-359. https://doi.org/10.11896/jsjkx.210100173
[9] 张俊, 王杨, 李坤豪, 李昌, 赵传信.
基于流形学习的多源传感器体域网数据融合模型
Multi-source Sensor Body Area Network Data Fusion Model Based on Manifold Learning
计算机科学, 2020, 47(8): 323-328. https://doi.org/10.11896/jsjkx.191000012
[10] 杨力, 李欣宇, 石怀峰, 潘成胜.
空间信息网络任务智能识别方法
Task Intelligent Identification Method for Spatial Information Network
计算机科学, 2020, 47(4): 262-269. https://doi.org/10.11896/jsjkx.190300111
[11] 贾旭, 孙福明, 李豪杰, 曹玉东.
基于有监督双正则NMF的静脉识别算法
Vein Recognition Algorithm Based on Supervised NMF with Two Regularization Terms
计算机科学, 2018, 45(8): 283-287. https://doi.org/10.11896/j.issn.1002-137X.2018.08.051
[12] 黄铉.
特征降维技术的研究与进展
Research and Development of Feature Dimensionality Reduction
计算机科学, 2018, 45(6A): 16-21.
[13] 张志禹, 刘思媛.
一种基于Curv-SAE特征融合的人脸降维和识别方法
Method of Face Recognition and Dimension Reduction Based on Curv-SAE Feature Fusion
计算机科学, 2018, 45(10): 267-271. https://doi.org/10.11896/j.issn.1002-137X.2018.10.049
[14] 张绍群.
基于紧集子覆盖的流形学习算法
Manifold Learning Algorithm Based on Compact Setsub-coverage
计算机科学, 2017, 44(Z6): 88-91. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.018
[15] 叶军,金忠.
基于对偶超图正则化的概念分解算法及其在数据表示中的应用
Hypergraph Dual Regularization Concept Factorization Algorithm and Its Application in Data Representation
计算机科学, 2017, 44(7): 309-314. https://doi.org/10.11896/j.issn.1002-137X.2017.07.056
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!