计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 111-116.doi: 10.11896/jsjkx.200800011
张岩金1, 白亮1,2
ZHANG Yan-jin1, BAI Liang1,2
摘要: 由于在实际应用中有大量的符号数据生成,符号数据聚类成为了聚类分析的一个重要研究领域。目前,已有许多符号数据聚类算法被提出,但将它们应用于大数据环境时,仍然存在计算成本高、运行速度慢等问题。文中提出了一种基于符号关系图的快速符号数据聚类算法。该算法使用符号关系图替代原始数据,缩小数据集的规模,有效地解决了这一问题。大量的实验分析显示新算法相比其他算法是有效的。
中图分类号:
[1]ZHOU Z H.Machine learning and its applications[M].Beijing:Tsinghua University Press,2009:15-20. [2]ZHONG X,MA S P,ZHANG B,et al.A survey of data mining[J].Pattern Recognition and Artificial Intelligence,2001,3(1):50-57. [3]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].Acm Computing Surveys,1999,31(3):264-323. [4]EL-SONBATY Y,ISMAIL M A.Fuzzy clustering for symbolic data[J].IEEE Transactions on Fuzzy Systems,1998,6(2):195-204. [5]HUANG Z.Extensions to the k-Means Algorithm for Cluste-ring Large Data Sets with Categorical Values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304. [6]WANG Z H,LIU S T,LUO Q.KNN Classification Algorithm based on improved K-modes clustering[J].Computer Engineering and Design,2019(8):2228-2234. [7]SUDIPTO G,RAJEEV R,KYUSEOK S.Rock:A robust clusteringalgorithm for categorical attributes[J].Information Systems,2005(5):345-366. [8]SHARMA S,SINGH M.Generalized similarity measure for cate-gorical data clustering[C]//2016 International Conference on Advances in Computing,Communications and Informatics(ICACCI).IEEE Press,2016:21-24. [9]DING X,TAN J,WANG M.A categorical data clustering algorithm and its efficient parallel implementation[C]//2016 5th International Conference on Computer Science and Network Technology(ICCSNT).IEEE Press,2017:224-228. [10]FISHE R,DOUGLAS H.Knowledge acquisitionvia incremental conceptual clustering[J].Machine Learning,1987,2(2):139-172. [11]MICHALSKI R S,STEPP R E.Automated Construction ofClassifications Conceptual Clustering Versus Numerical Taxo-nomy[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1983,5(4):396-410. [12]MAHAMADOU A J D,ANTOINE V,CHRISTIE G J,et al.Evidential clustering for categorical data[C]//2019 IEEE International Conference on Fuzzy Systems(FUZZ-IEEE).IEEE Press,2019:1-6. [13]RALAMBONDRAINY H.A conceptual version of theK-means algorithm[J].Pattern Recognition Letters,1995,16(11):1147-1157. [14]BARBARÁ D,LI Y,JULIA C.COOLCAT:an entropy-based algorithm for categorical clustering[C]//International Conference on Information and Knowledge Management.2002:582-589. [15]GOWDA K C,RAVI T V.Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity[J].Pattern Recognition,1995,28(8):1277-1282. [16]GOWDA K C,DIDAY E.Symbolic clustering using a new dissimilarity measure[M].Elsevier Science Inc.1991. [17]DINESH M S,GOWDA K C,NAGABHUSHAN P.Unsupervised classification for remotely sensed data using fuzzy set theo-ry[C]//Geoscience and Remote Sensing(IGARSS ’97).IEEE Press,1997. [18]NGUYEN T H T,HUYNH V N.A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure[C]//International Symposium on Foundations of Information & Knowledge Systems.Springer-Verlag New York,2016. [19]JIA B,LIANG Y,SU H.An improvedK-Modesclustering algorithm[J].Software Guide,2019,18(6):60-64. [20]MCDAID A F,GREENE D,HURLEY N.Normalized MutualInformation to evaluate overlapping community finding algorithms[J].arXiv:1110.2515. [21]WARRENS M J.On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index[J].Journal of Classification,2008,25(2):177-183. [22]YANG Y M.An Evaluation of Statistical Approaches to TextCategorization[J]. Proc. Amia. Annu. Fall. Symp.,1999,1(1/2):358-362. [23]IAMON N,BOONGOEN T,GARRETT S,et al.A Link-Based Cluster Ensemble Approach for Categorical Data Clustering[J].IEEE Transactions on Knowledge andData Engineering,2012,24(3):413-425. [24]STREHLA,GHOSH J.Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions[J].Journal of Machine Learning Research,2003,3(3):583-617. [25]MICHAEL K,LI J J,HUANG Z X,et al.On the impact of dissimilarity measure in k-modes clustering algorithm[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(3):503-507. [26]SAN O,HUYNH V,NAKAMORI Y.An alternative extension of the k-means algorithm for clustering categorical data[J].Pattern Recognition,2004,14(2):241-247. [27]CHEN K,LIU L.“Best K”:critical clustering structures in categorical datasets[J].Knowledge and Information Systems,2009,20(1):1-33. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[3] | 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平. 基于时空注意力克里金的边坡形变数据插值方法 Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation 计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161 |
[4] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059 |
[5] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[6] | 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞. 基于密度敏感距离和模糊划分的改进FCM算法 FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition 计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042 |
[7] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[8] | 刘丽, 李仁发. 医疗CPS协作网络控制策略优化 Control Strategy Optimization of Medical CPS Cooperative Network 计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230 |
[9] | 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳. 三维城市场景中的小物体检测 Small Object Detection in 3D Urban Scenes 计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174 |
[10] | 邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024 |
[11] | 朱哲清, 耿海军, 钱宇华. 面向化学结构的线段聚类算法 Line-Segment Clustering Algorithm for Chemical Structure 计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131 |
[12] | 张宇姣, 黄锐, 张福泉, 隋栋, 张虎. 基于菌群优化的近邻传播聚类算法研究 Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization 计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218 |
[13] | 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014 |
[14] | 左园林, 龚月姣, 陈伟能. 成本受限条件下的社交网络影响最大化方法 Budget-aware Influence Maximization in Social Networks 计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228 |
[15] | 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞. 基于节点相似性和网络嵌入的复杂网络社区发现算法 Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding 计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009 |
|