Computer Science ›› 2021, Vol. 48 ›› Issue (4): 111-116.doi: 10.11896/jsjkx.200800011

• Database & Big Data & Data Science • Previous Articles     Next Articles

Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph

ZHANG Yan-jin1, BAI Liang1,2   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory Computational Intelligence and Chinese Information Processing of Ministry of Education,Taiyuan 030006,China
  • Received:2020-06-24 Revised:2020-08-05 Online:2021-04-15 Published:2021-04-09
  • About author:ZHANG Yan-jin,born in 1995,postgraduate.Her main research interests include categorical data clustering.(zhang17836204220@163.com)
    BAI Liang,born in 1982,Ph.D,professor,is a member of China Computer Federation.His main research interests include cluster analysis and so on.
  • Supported by:
    National Natural Science Foundation of China (61773247,61876103) and Technology Research Development Projects of Shanxi (201901D211192).

Abstract: Since a large amount of symbolic data is generated in practical applications,clustering of symbolicl data becomes an important research area of cluster analysis.Currently,many symbolic data clustering algorithms are proposed.When they are applied in big data environment,there are still problems such as high computational cost and slow operation speed.This paper proposes a fast symbolic data clustering algorithm based on symbolic relation graphs.It effectively solves this problem by replacing the original data with a symbolic relation graph and reducing the size of the data set.A large number of experiments show that the new algorithm is more effective than other algorithms.

Key words: Clustering, Data mining, Relation graph, Similarity measure, Symbolic data

CLC Number: 

  • TP391
[1]ZHOU Z H.Machine learning and its applications[M].Beijing:Tsinghua University Press,2009:15-20.
[2]ZHONG X,MA S P,ZHANG B,et al.A survey of data mining[J].Pattern Recognition and Artificial Intelligence,2001,3(1):50-57.
[3]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].Acm Computing Surveys,1999,31(3):264-323.
[4]EL-SONBATY Y,ISMAIL M A.Fuzzy clustering for symbolic data[J].IEEE Transactions on Fuzzy Systems,1998,6(2):195-204.
[5]HUANG Z.Extensions to the k-Means Algorithm for Cluste-ring Large Data Sets with Categorical Values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
[6]WANG Z H,LIU S T,LUO Q.KNN Classification Algorithm based on improved K-modes clustering[J].Computer Engineering and Design,2019(8):2228-2234.
[7]SUDIPTO G,RAJEEV R,KYUSEOK S.Rock:A robust clusteringalgorithm for categorical attributes[J].Information Systems,2005(5):345-366.
[8]SHARMA S,SINGH M.Generalized similarity measure for cate-gorical data clustering[C]//2016 International Conference on Advances in Computing,Communications and Informatics(ICACCI).IEEE Press,2016:21-24.
[9]DING X,TAN J,WANG M.A categorical data clustering algorithm and its efficient parallel implementation[C]//2016 5th International Conference on Computer Science and Network Technology(ICCSNT).IEEE Press,2017:224-228.
[10]FISHE R,DOUGLAS H.Knowledge acquisitionvia incremental conceptual clustering[J].Machine Learning,1987,2(2):139-172.
[11]MICHALSKI R S,STEPP R E.Automated Construction ofClassifications Conceptual Clustering Versus Numerical Taxo-nomy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1983,5(4):396-410.
[12]MAHAMADOU A J D,ANTOINE V,CHRISTIE G J,et al.Evidential clustering for categorical data[C]//2019 IEEE International Conference on Fuzzy Systems(FUZZ-IEEE).IEEE Press,2019:1-6.
[13]RALAMBONDRAINY H.A conceptual version of theK-means algorithm[J].Pattern Recognition Letters,1995,16(11):1147-1157.
[14]BARBARÁ D,LI Y,JULIA C.COOLCAT:an entropy-based algorithm for categorical clustering[C]//International Conference on Information and Knowledge Management.2002:582-589.
[15]GOWDA K C,RAVI T V.Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity[J].Pattern Recognition,1995,28(8):1277-1282.
[16]GOWDA K C,DIDAY E.Symbolic clustering using a new dissimilarity measure[M].Elsevier Science Inc.1991.
[17]DINESH M S,GOWDA K C,NAGABHUSHAN P.Unsupervised classification for remotely sensed data using fuzzy set theo-ry[C]//Geoscience and Remote Sensing(IGARSS ’97).IEEE Press,1997.
[18]NGUYEN T H T,HUYNH V N.A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure[C]//International Symposium on Foundations of Information & Knowledge Systems.Springer-Verlag New York,2016.
[19]JIA B,LIANG Y,SU H.An improvedK-Modesclustering algorithm[J].Software Guide,2019,18(6):60-64.
[20]MCDAID A F,GREENE D,HURLEY N.Normalized MutualInformation to evaluate overlapping community finding algorithms[J].arXiv:1110.2515.
[21]WARRENS M J.On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index[J].Journal of Classification,2008,25(2):177-183.
[22]YANG Y M.An Evaluation of Statistical Approaches to TextCategorization[J]. Proc. Amia. Annu. Fall. Symp.,1999,1(1/2):358-362.
[23]IAMON N,BOONGOEN T,GARRETT S,et al.A Link-Based Cluster Ensemble Approach for Categorical Data Clustering[J].IEEE Transactions on Knowledge andData Engineering,2012,24(3):413-425.
[24]STREHLA,GHOSH J.Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions[J].Journal of Machine Learning Research,2003,3(3):583-617.
[25]MICHAEL K,LI J J,HUANG Z X,et al.On the impact of dissimilarity measure in k-modes clustering algorithm[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(3):503-507.
[26]SAN O,HUYNH V,NAKAMORI Y.An alternative extension of the k-means algorithm for clustering categorical data[J].Pattern Recognition,2004,14(2):241-247.
[27]CHEN K,LIU L.“Best K”:critical clustering structures in categorical datasets[J].Knowledge and Information Systems,2009,20(1):1-33.
[1] LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao. Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients [J]. Computer Science, 2022, 49(9): 183-193.
[2] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[3] LI Rong-fan, ZHONG Ting, WU Jin, ZHOU Fan, KUANG Ping. Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation [J]. Computer Science, 2022, 49(8): 33-39.
[4] YU Shu-hao, ZHOU Hui, YE Chun-yang, WANG Tai-zheng. SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion [J]. Computer Science, 2022, 49(6A): 256-260.
[5] MAO Sen-lin, XIA Zhen, GENG Xin-yu, CHEN Jian-hui, JIANG Hong-xia. FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition [J]. Computer Science, 2022, 49(6A): 285-290.
[6] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[7] CHEN Jia-zhou, ZHAO Yi-bo, XU Yang-hui, MA Ji, JIN Ling-feng, QIN Xu-jia. Small Object Detection in 3D Urban Scenes [J]. Computer Science, 2022, 49(6): 238-244.
[8] Ran WANG, Jiang-tian NIE, Yang ZHANG, Kun ZHU. Clustering-based Demand Response for Intelligent Energy Management in 6G-enabled Smart Grids [J]. Computer Science, 2022, 49(6): 44-54.
[9] XING Yun-bing, LONG Guang-yu, HU Chun-yu, HU Li-sha. Human Activity Recognition Method Based on Class Increment SVM [J]. Computer Science, 2022, 49(5): 78-83.
[10] ZHU Zhe-qing, GENG Hai-jun, QIAN Yu-hua. Line-Segment Clustering Algorithm for Chemical Structure [J]. Computer Science, 2022, 49(5): 113-119.
[11] ZHANG Yu-jiao, HUANG Rui, ZHANG Fu-quan, SUI Dong, ZHANG Hu. Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization [J]. Computer Science, 2022, 49(5): 165-169.
[12] YAO Xiao-ming, DING Shi-chang, ZHAO Tao, HUANG Hong, LUO Jar-der, FU Xiao-ming. Big Data-driven Based Socioeconomic Status Analysis:A Survey [J]. Computer Science, 2022, 49(4): 80-87.
[13] ZUO Yuan-lin, GONG Yue-jiao, CHEN Wei-neng. Budget-aware Influence Maximization in Social Networks [J]. Computer Science, 2022, 49(4): 100-109.
[14] YANG Xu-hua, WANG Lei, YE Lei, ZHANG Duan, ZHOU Yan-bo, LONG Hai-xia. Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding [J]. Computer Science, 2022, 49(3): 121-128.
[15] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!