Computer Science ›› 2024, Vol. 51 ›› Issue (6): 153-160.doi: 10.11896/jsjkx.230800200

• Database & Big Data & Data Science • Previous Articles     Next Articles

Subspace-based I-nice Clustering Algorithm

HE Yifan1, HE Yulin2, CUI Laizhong1,2, HUANG Zhexue1,2   

  1. 1 College of Computer Science and Software Engineering,Shenzhen University,Shenzhen,Guangdong 518060,China
    2 Guangdong Laboratory of Artificial Intelligence and Digital Economy(Shenzhen),Shenzhen,Guangdong 518107,China
  • Received:2023-08-31 Revised:2023-12-07 Online:2024-06-15 Published:2024-06-05
  • About author:HE Yifan,born in 1997,postgraduate,is a student member of CCF(No.B7445G).His main research interests include data mining and machine learning.
    HE Yulin,born in 1982,Ph.D,research fellow,is a member of CCF(No.97303M).His main research interests include big data approximate computing technologies,multi-sample statistics theories and me-thods,and data mining and machine algorithms and applications.
  • Supported by:
    National Natural Science Foundation of China(61972261),Natural Science Foundation of Guangdong Province(2023A1515011667),Key Basic Research Foundation of Shenzhen(JCYJ20220818100205012) and Basic Research Foundations of Shenzhen(JCYJ20210324093609026).

Abstract: Subspace clustering of high-dimensional data is a hot issue in the field of unsupervised learning.The difficulty of subspace clustering lies in finding the appropriate subspaces and corresponding clusters.At present,the most existing subspace clustering algorithms have the drawbacks of high computational complexity and difficulty in parameter selection because the number of subspaces combinations is very large and the algorithmic execution time is very long for high-dimensional data.Also,the diffe-rent datasets and application scenarios require different parameter inputs.Thus,this paper proposes a new subspace clustering algorithm named sub-I-nice to recognize all clusters in subspaces.First,the sub-I-nice algorithm randomly divides the original dimensions into groups to build subspaces.Second,I-niceMO algorithm is used to recognize clusters in each subspace.Finally,the newly-designed ball model is designed to construct subspace clustering ensemble.The persuasive experiments are conducted to validate the clustering performances of sub-I-nice algorithm on synthetic datasets with noise.Experimental results show that the sub-I-nice algorithm has better accuracy and robustness compared to the other three representative clustering algorithms,thereby confirming the rationality and effectiveness of the proposed algorithm.

Key words: Subspace clustering, I-nice clustering, High-dimensional data, Unsupervised learning, Ball model

CLC Number: 

  • TP391
[1]PARSONS L,HAQUE E,LIU H.Subspace clustering for high dimensional data:a review[J].ACM SIGKDD Explorations Newsletter,2004,6(1):90-105.
[2]XU R,WUNSCH D.Survey of clustering algorithms[J].IEEE Transactions on Neural Networks,2005,16(3):645-678.
[3]ZHU Y,TING K M,CARMAN M J.Grouping points by shared subspaces for effective subspace clustering[J].Pattern Recognition,2018,83:230-244.
[4]YU Z,LUO P,YOU J,et al.Incremental semi-supervised clustering ensemble for high dimensional data clustering[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(3):701-714.
[5]GOLALIPOUR K,AKBARI E,HAMIDI S S,et al.From clustering to clustering ensemble selection:A review[J].Enginee-ring Applications of Artificial Intelligence,2021,104:104388.
[6]CHEN X,YE Y,XU X,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446.
[7]PAUL D,SAHA S,MATHEW J.Improved subspace clustering algorithm using multi-objective framework and subspace optimization[J].Expert Systems with Applications,2020,158:113487.
[8]GAN G,NG M K P.Subspace clustering with automatic feature grouping[J].Pattern Recognition,2015,48(11):3703-3713.
[9]LIU Y,JIAO L C,SHANG F.Anefficient matrix factorizationbased low-rank representation for subspace clustering[J].Pattern Recognition,2013,46(1):284-292.
[10]ZHU P,ZHU W,HU Q,et al.Subspace clustering guided unsupervised feature selection[J].Pattern Recognition,2017,66:364-374.
[11]MASUD M A,HUANG J Z,WEI C,et al.I-nice:A new approach for identifying the number of clusters and initial cluster centres[J].Information Sciences,2018,466:129-151.
[12]AGGARWAL C C,WOLF J L,YU P S,et al.Fast algorithms for projected clustering[J].ACM SIGMOD Record,1999,28(2):61-72.
[13]AGGARWAL C C,YU P S.Finding generalized projected clusters in high dimensional spaces[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.2000:70-81.
[14]GOIL S,NAGESH H,CHOUDHARY A.Mafia:Efficient and scalable subspace clustering for very large data sets[C]//Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.1999:443-452.
[15]CHENG C H,FU A W,ZHANG Y.Entropy-based subspaceclustering for mining numerical data[C]//Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.1999:84-93.
[16]AGRAWAL R,GEHRKE J,GUNOPULOS D,et al.Automatic subspace clustering of high dimensional data for data mining applications[C]//Proceedings of the fourth ACM SIGMOD International Conference on Management of Data.1998:94-105.
[17]KAILING K,KRIEGELH P,KRÖGER P.Density-connectedsubspace clustering for high-dimensional data[C]//Proceedings of the 2004 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2004:246-256.
[18]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 1996 ACM SIGKDD Conference on Knowledge Discovery And Data Mining.1996:226-231.
[19]HAHSLER M,PIEKENBROCK M,DORAN D.DBSCAN:Fast density-based clustering with R[J].Journal of Statistical Software,2019,91:1-30.
[20]HUANG X,YE Y,GUO H,et al.DSKmeans:a new kmeans-type approach to discriminative subspace clustering[J].Know-ledge-Based Systems,2014,70:293-300.
[21]ZOGRAFOS V,ELLIS L,MESTER R.Discriminative subspace clustering[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition.2013:2107-2114.
[22]YAMAGUCHI M,IRIE G,KAWANISHI T,et al.Subspacestructure-aware spectral clustering for robust subspace clustering[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision.2019:9875-9884.
[23]PASSALIS N,TEFAS A.Discriminative clustering using regularized subspace learning[J].Pattern Recognition,2019,96:106982.
[24]XIE J,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//Proceedings of the 2016 International Conference on Machine Learning.2016:478-487.
[25]HERSHEY J R,CHEN Z,LE ROUX J,et al.Deep clustering:Discriminative embeddings for segmentation and separation[C]//Proceedings of the 2016 IEEE International Conference on Acoustics,Speech and Signal Processing.2016:31-35.
[26]CHEN Y,XIAO X,ZHOU Y.Multi-view subspace clustering via simultaneously learning the representation tensor and affinity matrix[J].Pattern Recognition,2020,106:107441.
[27]WANG C,CHEN X,NIE F,et al.Directly solving normalized cut for multi-view data[J].Pattern Recognition,2022,130:108809.
[28]ELHAMIFAR E,VIDAL R.Sparse subspace clustering:Algorithm,theory,and applications[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(11):2765-2781.
[29]LIU G,LIN Z,YAN S,et al.Robust recovery of subspace structures by low-rank representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,35(1):171-184.
[30]SHAO C L,SUN T F,DING S F.Ensemble clustering based on information entropy weighted[J].Journal of Nanjing University(Natural Sciences),2021,57(2):189-196.
[31]DUH Y,ZHANG J,WANG W J.A deep self-supervised clustering ensemble algorithm[J].CAAI Transactions on Intelligent Systems,2020,15(6):1113-1120.
[32]ZHOU P,WANG X,DU L,et al.Clustering ensemble via structured hypergraph learning[J].Information Fusion,2022,78:171-179.
[33]THORNDIKE R.Who belongs in the family?[J].Psychometri-ka,1953,18(4):267-276.
[34]ROUSSEEUW P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J].Journal of Computational and Applied Mathematics,1987,20:53-65.
[35]HUBERT L,ARABIE P.Comparing partitions[J].Journal of Classification,1985,2(1):193-218.
[1] CAI Jiacheng, DONG Fangmin, SUN Shuifa, TANG Yongheng. Unsupervised Learning of Monocular Depth Estimation:A Survey [J]. Computer Science, 2024, 51(2): 117-134.
[2] CAI Mengnan, SHEN Guohua, HUANG Zhiqiu, YANG Yang. High-dimensional Data Publication Under Local Differential Privacy [J]. Computer Science, 2024, 51(2): 322-332.
[3] JING Yeyiran, YU Zeng, SHI Yunxiao, LI Tianrui. Review of Unsupervised Domain Adaptive Person Re-identification Based on Pseudo-labels [J]. Computer Science, 2024, 51(1): 72-83.
[4] XU Jie, WANG Lisong. Contrastive Clustering with Consistent Structural Relations [J]. Computer Science, 2023, 50(9): 123-129.
[5] LIANG Yunhui, GAN Jianwen, CHEN Yan, ZHOU Peng, DU Liang. Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking [J]. Computer Science, 2023, 50(7): 72-81.
[6] SHEN Yuancheng, BAN Rui, CHEN Xin, HUA Runduo, WANG Yunhai. Anomaly Detection Algorithm for Network Device Configuration Based on Configuration Statement Tree [J]. Computer Science, 2023, 50(11A): 230200128-10.
[7] XIE Zhuo, KANG Le, ZHOU Lijuan, ZHANG Zhihong. Clustering Method Based on Contrastive Learning for Multi-relation Attribute Graph [J]. Computer Science, 2023, 50(11): 62-70.
[8] CAO Dongtao, SHU Wenhao, QIAN Jin. Feature Selection Algorithm Based on Rough Set and Density Peak Clustering [J]. Computer Science, 2023, 50(10): 37-47.
[9] WANG Bin, LIANG Yudong, LIU Zhe, ZHANG Chao, LI Deyu. Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment [J]. Computer Science, 2023, 50(1): 123-130.
[10] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[11] ZHANG Hua-wei, LU Xin-dong, ZHU Xiao-ming, SUN Jun-tao. Structure Preserved Multi-view Subspace Clustering Based on t-SVD [J]. Computer Science, 2022, 49(11A): 210800215-6.
[12] ZHANG Yu-xin, CHEN Yi-qiang. Driver Distraction Detection Based on Multi-scale Feature Fusion Network [J]. Computer Science, 2022, 49(11): 170-178.
[13] LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao. Locality and Consistency Based Sequential Ensemble Method for Outlier Detection [J]. Computer Science, 2022, 49(1): 146-152.
[14] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[15] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!