计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 153-160.doi: 10.11896/jsjkx.230800200

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于子空间的I-nice聚类算法

何一帆1, 何玉林2, 崔来中1,2, 黄哲学1,2   

  1. 1 深圳大学计算机与软件学院 广东 深圳 518060
    2 人工智能与数字经济广东省实验室(深圳) 广东 深圳 518107
  • 收稿日期:2023-08-31 修回日期:2023-12-07 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 何玉林(yulinhe@gml.ac.cn)
  • 作者简介:(396981852@qq.com)
  • 基金资助:
    国家自然科学基金面上项目(61972261);广东省自然科学基金面上项目(2023A1515011667);深圳市基础研究重点项目(JCYJ20220818100205012);深圳市基础研究面上项目(JCYJ20210324093609026)

Subspace-based I-nice Clustering Algorithm

HE Yifan1, HE Yulin2, CUI Laizhong1,2, HUANG Zhexue1,2   

  1. 1 College of Computer Science and Software Engineering,Shenzhen University,Shenzhen,Guangdong 518060,China
    2 Guangdong Laboratory of Artificial Intelligence and Digital Economy(Shenzhen),Shenzhen,Guangdong 518107,China
  • Received:2023-08-31 Revised:2023-12-07 Online:2024-06-15 Published:2024-06-05
  • About author:HE Yifan,born in 1997,postgraduate,is a student member of CCF(No.B7445G).His main research interests include data mining and machine learning.
    HE Yulin,born in 1982,Ph.D,research fellow,is a member of CCF(No.97303M).His main research interests include big data approximate computing technologies,multi-sample statistics theories and me-thods,and data mining and machine algorithms and applications.
  • Supported by:
    National Natural Science Foundation of China(61972261),Natural Science Foundation of Guangdong Province(2023A1515011667),Key Basic Research Foundation of Shenzhen(JCYJ20220818100205012) and Basic Research Foundations of Shenzhen(JCYJ20210324093609026).

摘要: 高维数据的子空间聚类是无监督学习领域的热点研究问题,其难点在于寻找恰当的子空间以及其中的数据簇。大多数现有的子空间聚类算法均存在计算复杂度高和参数选择难的缺陷,这是因为在高维数据中子空间的组合数量很大,算法的执行时间非常长,且不同数据集和应用场景需要不同的参数设定。为此,提出了基于子空间的I-nice(简记为sub-I-nice)聚类算法用于识别高维数据中子空间内数据簇的个数。首先,该算法将原始数据维度随机划分成多个维度组,根据维度组生成子空间样本;接着,使用最新的I-niceMO算法对每个子空间数据进行聚类;最后,采用新设计的球模型对所有子空间的基聚类结果进行集成。在含有噪声的高维仿真数据集上对所提出的sub-I-nice算法进行了详细的性能验证,实验结果表明sub-I-nice算法相比其他3种代表性聚类算法有更好的准确性和鲁棒性,从而证实了其合理性和有效性。

关键词: 子空间聚类, I-nice聚类, 高维数据, 无监督学习, 球模型

Abstract: Subspace clustering of high-dimensional data is a hot issue in the field of unsupervised learning.The difficulty of subspace clustering lies in finding the appropriate subspaces and corresponding clusters.At present,the most existing subspace clustering algorithms have the drawbacks of high computational complexity and difficulty in parameter selection because the number of subspaces combinations is very large and the algorithmic execution time is very long for high-dimensional data.Also,the diffe-rent datasets and application scenarios require different parameter inputs.Thus,this paper proposes a new subspace clustering algorithm named sub-I-nice to recognize all clusters in subspaces.First,the sub-I-nice algorithm randomly divides the original dimensions into groups to build subspaces.Second,I-niceMO algorithm is used to recognize clusters in each subspace.Finally,the newly-designed ball model is designed to construct subspace clustering ensemble.The persuasive experiments are conducted to validate the clustering performances of sub-I-nice algorithm on synthetic datasets with noise.Experimental results show that the sub-I-nice algorithm has better accuracy and robustness compared to the other three representative clustering algorithms,thereby confirming the rationality and effectiveness of the proposed algorithm.

Key words: Subspace clustering, I-nice clustering, High-dimensional data, Unsupervised learning, Ball model

中图分类号: 

  • TP391
[1]PARSONS L,HAQUE E,LIU H.Subspace clustering for high dimensional data:a review[J].ACM SIGKDD Explorations Newsletter,2004,6(1):90-105.
[2]XU R,WUNSCH D.Survey of clustering algorithms[J].IEEE Transactions on Neural Networks,2005,16(3):645-678.
[3]ZHU Y,TING K M,CARMAN M J.Grouping points by shared subspaces for effective subspace clustering[J].Pattern Recognition,2018,83:230-244.
[4]YU Z,LUO P,YOU J,et al.Incremental semi-supervised clustering ensemble for high dimensional data clustering[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(3):701-714.
[5]GOLALIPOUR K,AKBARI E,HAMIDI S S,et al.From clustering to clustering ensemble selection:A review[J].Enginee-ring Applications of Artificial Intelligence,2021,104:104388.
[6]CHEN X,YE Y,XU X,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446.
[7]PAUL D,SAHA S,MATHEW J.Improved subspace clustering algorithm using multi-objective framework and subspace optimization[J].Expert Systems with Applications,2020,158:113487.
[8]GAN G,NG M K P.Subspace clustering with automatic feature grouping[J].Pattern Recognition,2015,48(11):3703-3713.
[9]LIU Y,JIAO L C,SHANG F.Anefficient matrix factorizationbased low-rank representation for subspace clustering[J].Pattern Recognition,2013,46(1):284-292.
[10]ZHU P,ZHU W,HU Q,et al.Subspace clustering guided unsupervised feature selection[J].Pattern Recognition,2017,66:364-374.
[11]MASUD M A,HUANG J Z,WEI C,et al.I-nice:A new approach for identifying the number of clusters and initial cluster centres[J].Information Sciences,2018,466:129-151.
[12]AGGARWAL C C,WOLF J L,YU P S,et al.Fast algorithms for projected clustering[J].ACM SIGMOD Record,1999,28(2):61-72.
[13]AGGARWAL C C,YU P S.Finding generalized projected clusters in high dimensional spaces[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.2000:70-81.
[14]GOIL S,NAGESH H,CHOUDHARY A.Mafia:Efficient and scalable subspace clustering for very large data sets[C]//Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.1999:443-452.
[15]CHENG C H,FU A W,ZHANG Y.Entropy-based subspaceclustering for mining numerical data[C]//Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.1999:84-93.
[16]AGRAWAL R,GEHRKE J,GUNOPULOS D,et al.Automatic subspace clustering of high dimensional data for data mining applications[C]//Proceedings of the fourth ACM SIGMOD International Conference on Management of Data.1998:94-105.
[17]KAILING K,KRIEGELH P,KRÖGER P.Density-connectedsubspace clustering for high-dimensional data[C]//Proceedings of the 2004 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2004:246-256.
[18]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 1996 ACM SIGKDD Conference on Knowledge Discovery And Data Mining.1996:226-231.
[19]HAHSLER M,PIEKENBROCK M,DORAN D.DBSCAN:Fast density-based clustering with R[J].Journal of Statistical Software,2019,91:1-30.
[20]HUANG X,YE Y,GUO H,et al.DSKmeans:a new kmeans-type approach to discriminative subspace clustering[J].Know-ledge-Based Systems,2014,70:293-300.
[21]ZOGRAFOS V,ELLIS L,MESTER R.Discriminative subspace clustering[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition.2013:2107-2114.
[22]YAMAGUCHI M,IRIE G,KAWANISHI T,et al.Subspacestructure-aware spectral clustering for robust subspace clustering[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision.2019:9875-9884.
[23]PASSALIS N,TEFAS A.Discriminative clustering using regularized subspace learning[J].Pattern Recognition,2019,96:106982.
[24]XIE J,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 2016 International Conference on Machine Learning.2016:478-487.
[25]HERSHEY J R,CHEN Z,LE ROUX J,et al.Deep clustering:Discriminative embeddings for segmentation and separation[C]//Proceedings of the 2016 IEEE International Conference on Acoustics,Speech and Signal Processing.2016:31-35.
[26]CHEN Y,XIAO X,ZHOU Y.Multi-view subspace clustering via simultaneously learning the representation tensor and affinity matrix[J].Pattern Recognition,2020,106:107441.
[27]WANG C,CHEN X,NIE F,et al.Directly solving normalized cut for multi-view data[J].Pattern Recognition,2022,130:108809.
[28]ELHAMIFAR E,VIDAL R.Sparse subspace clustering:Algorithm,theory,and applications[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(11):2765-2781.
[29]LIU G,LIN Z,YAN S,et al.Robust recovery of subspace structures by low-rank representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,35(1):171-184.
[30]SHAO C L,SUN T F,DING S F.Ensemble clustering based on information entropy weighted[J].Journal of Nanjing University(Natural Sciences),2021,57(2):189-196.
[31]DUH Y,ZHANG J,WANG W J.A deep self-supervised clustering ensemble algorithm[J].CAAI Transactions on Intelligent Systems,2020,15(6):1113-1120.
[32]ZHOU P,WANG X,DU L,et al.Clustering ensemble via structured hypergraph learning[J].Information Fusion,2022,78:171-179.
[33]THORNDIKE R.Who belongs in the family?[J].Psychometri-ka,1953,18(4):267-276.
[34]ROUSSEEUW P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20:53-65.
[35]HUBERT L,ARABIE P.Comparing partitions[J].Journal of Classification,1985,2(1):193-218.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!