计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 116-123.doi: 10.11896/jsjkx.210200098

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于锚点的快速无监督图嵌入

杨辉1,2, 陶力宏1,2, 朱建勇1,2, 聂飞平3   

  1. 1 华东交通大学电气与自动化工程学院 南昌 330013;
    2 江西省先进控制与优化重点实验室 南昌 330013;
    3 西北工业大学光学影像分析与学习中心 西安 710072
  • 收稿日期:2021-02-13 修回日期:2021-07-09 发布日期:2022-04-01
  • 通讯作者: 聂飞平(feipingnie@gmail.com)
  • 作者简介:(yhshuo@163.com)
  • 基金资助:
    国家自然科学基金(61963015,61733005,61863014); 江西省自然科学基金(20171ACB21039,20192BAB207024); 江西省教育厅科技计划(GJJ150552)

Fast Unsupervised Graph Embedding Based on Anchors

YANG Hui1,2, TAO Li-hong1,2, ZHU Jian-yong1,2, NIE Fei-ping3   

  1. 1 School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China;
    2 Key Laboratory of Advanced Control and Optimization of Jiangxi Province, Nanchang 330013, China;
    3 Center for Optical Image Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi'an 710072, China
  • Received:2021-02-13 Revised:2021-07-09 Published:2022-04-01
  • About author:YANG Hui,born in 1965,Ph.D,professor,Ph.D supervisor.His main research interests include intelligent transportation system,complex system modeling,control and optimization,process industry integrated automation technology and applications.NIE Fei-ping,born in 1977,Ph.D,professor,Ph.D supervisor.His main research interests include machine lear-ning and its applications,such as pattern recognition,data mining,computer vision,image processing and information retrieval.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61963015,61733005,61863014),Natural Science Foundation of Jiangxi Province(20171ACB21039,20192BAB207024) and Science and Technology Project of Jiangxi Provincial Department of Education(GJJ150552).

摘要: 图嵌入降维算法由于其有效性被广泛应用。传统图嵌入算法构造K-Nearest Neighbors(K-NN)图的计算复杂度至少为O(n2d),其中n为样本数,d为样本维度。在数据量大的情况下,构造K-NN图将非常耗时,因为其计算复杂度与样本数的平方成正比,这将限制图嵌入算法在大规模数据集上的应用。为降低构图过程的计算复杂度,提出一种基于锚点的快速无监督图嵌入算法(Fast Unsupervised Graph Embedding Based on Anchors,FUGE)。该算法首先从数据集中选取锚点(代表点),然后构造数据点-锚点相似度图,最后执行图嵌入分析。由于锚点数量远小于数据量,所提方法能有效地降低构图过程的计算复杂度;不同于使用核函数来构造相似度图,该算法直接通过数据点的近邻信息来学习数据点-锚点的相似度图,这进一步加快了构图过程。整个算法的计算复杂度为O(nd2+nmd),其中m为锚点数。在基准数据集上的大量实验证明了所提算法的有效性和高效性。

关键词: K-means++, 降维, 锚点, 图嵌入, 正交约束

Abstract: Graph embedding is a widely used method for dimensionality reduction due to its computational effectiveness.The computational complexity of graph embedding method to construct traditional K-Nearest Neighbors (K-NN) graph is at least O(n2d), where n and d represents the sample size and dimensions respectively.The construction of K-NN graphs is very time-consuming since the computational complexity is proportional to the square of the number of samples in the case of a large amount of data, which will limit the application of graph embedding algorithms on large-scale data sets.To address this problem, a fast unsupervised graph embedding based on anchors (FUGE) method is proposed in this paper.FUGE first selects anchors (representative points) from the data, then constructs a similarity graph between the data and anchors, and finally performs graph embedding analysis.Since the number of anchors is much smaller than the number of data, the proposed method can effectively reduce the computational complexity of the process of graph construction.Different from using the kernel function to construct the similarity graph, FUGE directly learns the data point-anchor similarity graph by the neighbor information of the data, which further accelerates the process of graph construction.The overall computational complexity of FUGE is O(nd2+nmd), where m is the number of anchors.Extensive experiments on real-world benchmark data sets show the effectiveness and efficiency of the proposed method.

Key words: K-means++, Anchors, Dimensionality reduction, Graph embedding, Orthogonal constraint

中图分类号: 

  • TP391
[1] HE W Q,LIU B L,SUN Z C,et al.Kernel-preserving Embedding Based Subspace Learning[J].Computer Science,2021,48(6):79-85.
[2] TAO Y,BAO L L,HU H.Dimensionality Reduction MethodCombining Representation Learning and Embedded Subspace Learning[J].Computer Engineering,2021,47(6):83-87,97.
[3] ZOU C M,CHEN D.Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis[J].Computer Science,2021,48(2):121-127.
[4] CUNNINGHAM J P,YU B M.Dimensionality reduction forlarge-scale neural recordings[J].Nature Neuroscience,2014,17(11):1500-1509.
[5] WONG W K,LAI Z,WEN J,et al.Low-Rank Embedding for Robust Image Feature Extraction[J].IEEE Trans Image Process,2017,26(6):2905-2917.
[6] LIII P K.On lines and planes of closest fit to systems of points in space[J].The London,Edinburgh,and Dublin Philosophical Magazine and Journal of Science,1901,2(11):559-572.
[7] BELHUMEUR P N,HESPANHA J P,KRIEGMAN D J.Eigenfaces vs.fisherfaces:Recognition using class specific linear projection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(7):711-720.
[8] HE X,NIYOGI P.Locality preserving projections[J].Advances in Neural Information Processing Systems,2004,16(16):153-160.
[9] BELKIN M,NIYOGI P.Laplacian Eigenmaps and SpectralTechniques for Embedding and Clustering[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.Cambridge:MIT Press,2001:585-591.
[10] TENENBAUM J B,DE SILVA V,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
[11] ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[12] YAN S,XU D,ZHANG B,et al.Graph embedding and extensions:A general framework for dimensionality reduction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,29(1):40-51.
[13] CAI D,HE X,HAN J.Spectral regression:A unified subspace learning framework for content-based image retrieval[C]//Proceedings of the 15th ACM international Conference on Multi-media.New York:ACM Press,2007:403-412.
[14] LIU W,HE J,CHANG S F.Large graph construction for scalable semi-supervised learning[C]//Proceedings of the 27th International Conference on Machine Learning.New York:ACM Press,2010:679-686.
[15] CAI D.Compressed spectral regression for efficient nonlinear dimensionality reduction[C]//Proceedings of the 24th International Conference on Artificial Intelligence.California:AAAI Press,2015:3359-3365.
[16] NIE F,ZHU W,LI X.Unsupervised large graph embedding[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.California:AAAI Press,2017:2422-2428.
[17] JIANG R,FU W,WEN L,et al.Dimensionality reduction on anchorgraph with an efficient locality preserving projection[J].Neurocomputing,2016,187:109-118.
[18] YU W,NIE F,WANG F,et al.Fast and flexible large graph embedding based on anchors[J].IEEE Journal of Selected Topics in Signal Processing,2018,12(6):1465-1475.
[19] GOYAL P,FERRARA E.Graph embedding techniques,applications,and performance:A survey[J].Knowledge-Based Systems,2018,151:78-94.
[20] CAI H,ZHENG V W,CHANG K C C.A comprehensive survey of graph embedding:Problems,techniques,and applications[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(9):1616-1637.
[21] GUO Y F,LI S J,YANG J Y,et al.A generalized Foley-Sammon transform based on generalized fisher discriminant criterion and its application to face recognition[J].Pattern Recognition Letters,2003,24(1/2/3):147-158.
[22] WANG H,YAN S,XU D,et al.Trace ratio vs.ratio trace for dimensionality reduction[C]//IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2007:1-8.
[23] JIA Y,NIE F,ZHANG C.Trace ratio problem revisited[J].IEEE Transactions on Neural Networks,2009,20(4):729-735.
[24] FUKUNAGA K.Introduction to statistical pattern recognition[M].Elsevier,2013.
[25] VASSILVITSKII S,ARTHUR D.k-means++:The advantages of careful seeding[C]//Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms.2006:1027-1035.
[26] BACHEM O,LUCIC M,HASSANI S H,et al.Approximate k-means++in sublinear time[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.California:AAAI Press,2016:1459-1467.
[27] NIE F,WANG X,HUANG H.Clustering and projected clustering with adaptive neighbors[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.New York:ACM Press,2014:977-986.
[28] LIU Y,NIE F,GAO Q,et al.Flexible unsupervised feature extraction for image classification[J].Neural Networks,2019,115:65-71.
[29] ZELNIK-MANOR L,PERONA P.Self-tuning spectral clustering[C]//Proceedings of the 17th International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2004:1601-1608.
[30] ESTÉVEZ P A,TESMER M,PEREZ C A,et al.Normalized mutual information feature selection[J].IEEE Transactions on Neural Networks,2009,20(2):189-201.
[1] 李勇, 吴京鹏, 张钟颖, 张强.
融合快速注意力机制的节点无特征网络链路预测算法
Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism
计算机科学, 2022, 49(4): 43-48. https://doi.org/10.11896/jsjkx.210800276
[2] 赵志强, 易秀双, 李婕, 王兴伟.
基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究
Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm
计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001
[3] 刘嘉琛, 秦小麟, 朱润泽.
基于LSTM-Attention的RFID移动对象位置预测
Prediction of RFID Mobile Object Location Based on LSTM-Attention
计算机科学, 2021, 48(3): 188-195. https://doi.org/10.11896/jsjkx.200600134
[4] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[5] 邢长征, 朱金侠, 孟祥福, 齐雪月, 朱尧, 张峰, 杨一鸣.
兴趣点推荐方法研究综述
Point-of-interest Recommendation:A Survey
计算机科学, 2021, 48(11A): 176-183. https://doi.org/10.11896/jsjkx.201100021
[6] 杨力, 李欣宇, 石怀峰, 潘成胜.
空间信息网络任务智能识别方法
Task Intelligent Identification Method for Spatial Information Network
计算机科学, 2020, 47(4): 262-269. https://doi.org/10.11896/jsjkx.190300111
[7] 吴丹丹, 吕鑫.
分布式结构下基于用户协作的匿名区域构建算法
Location Anonymous Algorithm Based on User Collaboration under Distributed Structure
计算机科学, 2019, 46(4): 158-163. https://doi.org/10.11896/j.issn.1002-137X.2019.04.025
[8] 贾旭, 孙福明, 李豪杰, 曹玉东.
基于有监督双正则NMF的静脉识别算法
Vein Recognition Algorithm Based on Supervised NMF with Two Regularization Terms
计算机科学, 2018, 45(8): 283-287. https://doi.org/10.11896/j.issn.1002-137X.2018.08.051
[9] 李香元,蔡骋,何进荣.
基于密度缩放因子的ISOMAP算法
Density Scaling Factor Based ISOMAP Algorithm
计算机科学, 2018, 45(7): 207-213. https://doi.org/10.11896/j.issn.1002-137X.2018.07.036
[10] 黄铉.
特征降维技术的研究与进展
Research and Development of Feature Dimensionality Reduction
计算机科学, 2018, 45(6A): 16-21.
[11] 张志禹, 刘思媛.
一种基于Curv-SAE特征融合的人脸降维和识别方法
Method of Face Recognition and Dimension Reduction Based on Curv-SAE Feature Fusion
计算机科学, 2018, 45(10): 267-271. https://doi.org/10.11896/j.issn.1002-137X.2018.10.049
[12] 首照宇,杨晓帆.
联合Gabor误差字典和低秩表示的人脸识别算法
Jointing Gabor Error Dictionary and Low Rank Representation for Face Recognition
计算机科学, 2017, 44(3): 296-299. https://doi.org/10.11896/j.issn.1002-137X.2017.03.060
[13] 张建辉,王会青,孙宏伟,郭芷榕,白莹莹.
基于二分迭代SAX的时序相似性度量算法
Similarity Measure Algorithm of Time Series Based on Binary-dividing SAX
计算机科学, 2017, 44(1): 247-252. https://doi.org/10.11896/j.issn.1002-137X.2017.01.046
[14] 梅玲玲,龚劬.
基于改进的自适应局部保持投影算法的人脸识别
Face Recognition Based on Improved Adaptive Locality Preserving Projection
计算机科学, 2016, 43(8): 286-291. https://doi.org/10.11896/j.issn.1002-137X.2016.08.058
[15] 曾安,郑齐弥.
基于MIC的深度置信网络研究
Deep Belief Networks Research Based on Maximum Information Coefficient
计算机科学, 2016, 43(8): 249-253. https://doi.org/10.11896/j.issn.1002-137X.2016.08.050
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!