计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 136-144.doi: 10.11896/jsjkx.200700213
梁伟1,2, 段晓东1, 徐健锋1,3,4
LIANG Wei1,2, DUAN Xiao-dong1, XU Jian-feng1,3,4
摘要: 基础聚类成员预处理是聚类集成算法中的一个重要研究步骤。众多研究表明,基础聚类成员集合的差异性会影响聚类集成算法性能。当前聚类集成研究围绕着生成基础聚类和优化集成策略展开,而针对基础聚类成员的差异性度量及其优化的研究尚不完善。文中基于Jaccard相似性提出一种基础聚类成员差异性度量指标,并结合三支决策思想提出了基础聚类成员差异性三支过滤方法。该方法首先设定基础聚类成员的三支决策的初始阈值α(0)和β(0),然后计算各个基础聚类成员的差异性度量指标,进而实施三支决策。其决策策略为:当基础聚类成员的差异性度量指标小于指定阈值α(0)时,删除该基础聚类成员;当基础聚类成员的差异性度量指标大于指定阈值β(0)时,保留该基础聚类成员;当基础聚类成员的差异性度量指标大于α(0)且小于β(0)时,该基础聚类成员被归入三支决策边界域等待进一步判断。当结束一轮三支决策后,算法将重新计算三支决策阈值α(1)和β(1)并对上轮三支决策边界域重新进行三支决策,直至没有基础聚类成员被归入三支决策边界域或达到指定迭代次数。对比实验表明基础差异性度量的基础聚类三支过滤方法能够有效地提升聚类集成效果。
中图分类号:
[1] HUANG D,LAI J H,WANG C D.Combining multiple cluste-rings via crowd agreement estimation and multi-granularity link analysis[J].Neurocomputing,2015,170:240-250. [2] ZHOU Z H.Ensemble Methods-Foundations and Algorithms [M].Taylor&Francis,2013,81(3):470-470. [3] TOPCHY A,JAIN A K,PUNCH W.Clustering ensembles:models of consensus and weak partitions[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2005,27(12):1866-1881. [4] STREHL A,GHOSH J.Cluster ensembles:A knowledge reuse framework for combining multiple partitions.[J].Journal of Machine Learning Research,2002,3(12):583-617. [5] FRED A L,JAIN A K.Combining multiple clusterings using evi-dence accumulation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(6):835-850. [6] FERN X Z,BRODLEY C E.Random projection for high dimensional data clustering:A cluster ensemble approach[C]//Proceedings of 20th International Conference on Machine Learning.2003:186-193. [7] APPROACH C E,FERN X Z,BRODLEY C E.Random Projection for High Dimensional Data Clustering[C]//Twentieth International Conference on International Conference on Machine Learning.AAAI Press,2003. [8] MINAEIBIDGOLI B,TOPCHY A,PUNCH W F.Ensembles of partitions via data resampling[C]//International Conference on Information Technology:Coding & Computing.IEEE Computer Society,2004. [9] DUDOIT S,FRIDLYAND J.Bagging to improve the accuracy of a clustering procedure[J].Bioinformatics,2003,19(9):1090-1099. [10] YANG Y,JIANG J.Hybrid sampling-based clustering ensemble with global and local constitutions[J].IEEE Transactions on Neural Networks and Learning Systems,2016,27(5):952-965. [11] ZHOU P,DU L,SHI L,Wang H,et al.Learning a robust consensus matrix for clustering ensemble via kullback-leibler divergence minimization[C]//Proc.the 25th International Joint Conference on Artificial Intelligence.2015. [12] YU Z,LUO P,YOU J,et al.Incremental semi-supervised clustering ensemble for high dimensional data clustering[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(3):701-714. [13] YU Z,LI L,LIU J,et al.Adaptive noise immune cluster ensemble using affinity propagation[J].IEEE Transactions on Know-ledge and Data Engineering,2015,27(12):3176-3189. [14] FROUZAN R,SAMAD N,HAMID P,et al.Dibversity Based Cluster Weighting In Cluster Ensemble:An Information Theory Approach.[J].Artificial Intelligence Review,2019,52(2):1341-1368. [15] WANG T. CA-Tree:A hierarchical structure for efficient and scalable coassociation-based cluster ensembles[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2011,41(3):686-698. [16] TUMER K,AGOGINO A K.Ensemble clustering with voting active clusters[J].Pattern Recognition Letters,2008,29(14):1947-1953. [17] HUANG D,WANG C D,LAI J H.Locally Weighted Ensemble Clustering[J].IEEE Transactions on Cybernetics,2016,48(5):1460-1473. [18] HONG Y,YUN C,PAWAN L,et al.A three-way cluster ensemble approach for large-scale data[J].International Journal of Approximate Reasoning,2019,115:32-49. [19] KANG Q,LIU S Y,ZHOU M C,et al.A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence[J].Knowledge Based Systems,2016,104(Jul):156-164. [20] LIANG W,ZHANG Y J,XU J F,et al.Optimization of Basic Clustering for Ensemble Clustering:An Information-Theoretic Perspective[J].IEEE Access,2019,7:179048-179062. [21] HUANG D,WANG C,PENG H,et al.Enhanced ensemble clus-tering via fast propagation of cluster-wise similarities[J].IEEE Trans.Syst.Man,Cybern.,Syst.2019,11:1-12. [22] PARVIN H AND MINAEI-BIDGOLI B.A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm[J].Pattern Anal.Appl.,2015,18(1):87-112. [23] SONG J H.Research on clustering integration algorithm [D].Harbin:Harbin Engineering University,2015. [24] NIWATTANAKUL S,SINGTHONGCHAI J,NAENUDO-RNE,et al.Using of Jaccard Coefficient for Keywords Similarity[C]//Iaeng International Conference on Internet Computing & Web Services.International Association of Engineers,2013. [25] IAM-ON N,BOONGEON T,GARRETT S,et al.A Link-Based Cluster Ensemble Approach for Categorical Data Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(3):413-425. [26] LUO H L,KONG F S,LI Y X.An Analysis of Diversity Mea-sures in Clustering Ensembles[J].Chinese Journal of Compu-ters,2007,30(8):1315-1324. [27] NATTHAKAN I,GARRETT S.LinkCluE:A MATLAB pac-kage for link based cluster ensembles[J].Stat.Softw.,2010,36(9):1-36. [28] PARVIN H,MINAEI-BIDGOLI B.A clustering ensembleframework based on elite selection of weighted clusters[J].Adv.Data Anal.Classification,2013,7(2):181-208. [29] YU Z,LUO P,YOU J,et al.Incremental semi-supervised clustering ensemble for high dimensional data clustering[J].IEEE Trans.Knowl.Data Eng.2016,28(3):701-714. [30] FERN X,BRODLEY C.Solving cluster ensemble problems by bipartite graph partitioning[C]//Proc.Int.Conf.Mach.Learn.,2004:36. [31] DOMENICONI C,AL-RAZGAN M.Weighted cluster ensem-bles:Methods and analysis[J].ACM Trans.Knowl.Discovery Data,2009:2-17. [32] HUANG D,LAI J,WANG C.Robust ensemble clustering using probability trajectories[J].IEEE Trans.Knowl.Data Eng.,2016,28(5):1312-1326. [33] GREENE D,TSYMBAL A,BOLSHAKOVA N,et al.Ensemble Clustering in Medical Diagnostics[C]//17th IEEE Symposium on Computer-Based Medical Systems,2004(CBMS 2004).IEEE,2004. [34] HADJITODOROV S T,KUNCHEVA L I,TODOROVA L P. Moderate diversity for better cluster ensembles[J].Information Fusion,2006,7(3):268-275. [35] YAO Y.Decision-theoretic rough set models[C]//International Conference on Rough Sets and Knowledge Technology.Springer-Verlag,2007:1-12. [36] QIAN Y H,ZHANG H,SANG Y L,et al.Multi-granulation decision theoretic rough sets[J].International Journal of Approximate Reasoning,2014,55(1):225-237. [37] MIAO D,XU F,YAO Y,et al.Set theory description of particle calculation[J].Journal of Computer,2012,35 (2):351-363. [38] ABUALIGAH L M,KHADER A T,AL-BETAR M A,et al.Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering[J].Expert Systems with Applications,2017,84:24-36. [39] STREHL A,GHOSH J. Cluster ensembles:A knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2003,3(12):583-617. [40] LU Z,PENG Y,IP H H S.Combining multiple clusterings using fast simulated annealing.[J].Pattern Recognition Letters,2011,32(15):1956-1961. |
[1] | 王志成, 高灿, 邢金明. 一种基于正域的三支近似约简 Three-way Approximate Reduction Based on Positive Region 计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067 |
[2] | 张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059 |
[3] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[4] | 辛现伟, 史春雷, 韩雨琦, 薛占熬, 宋继华. 基于三支决策的增量标签传播算法 Incremental Tag Propagation Algorithm Based on Three-way Decision 计算机科学, 2021, 48(11A): 102-105. https://doi.org/10.11896/jsjkx.210300065 |
[5] | 薛占熬, 张敏, 赵丽平, 李永祥. 集对优势关系下多粒度决策粗糙集的可变三支决策模型 Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation 计算机科学, 2021, 48(1): 157-166. https://doi.org/10.11896/jsjkx.191200175 |
[6] | 陈玉金, 徐吉辉, 史佳辉, 刘宇. 基于直觉犹豫模糊集的三支决策模型及其应用 Three-way Decision Models Based on Intuitionistic Hesitant Fuzzy Sets and Its Applications 计算机科学, 2020, 47(8): 144-150. https://doi.org/10.11896/jsjkx.190800041 |
[7] | 邵超, 马进家. 基于Xie-Beni指数的选择性聚类集成 Selective Clustering Ensemble Based on Xie-Beni Index 计算机科学, 2020, 47(6A): 457-460. https://doi.org/10.11896/JsJkx.190700044 |
[8] | 向伟, 王新维. 基于多类邻域三支决策模型的不平衡数据分类 Imbalance Data Classification Based on Model of Multi-class Neighbourhood Three-way Decision 计算机科学, 2020, 47(5): 103-109. https://doi.org/10.11896/jsjkx.180601099 |
[9] | 李艳, 张丽, 陈俊芬. 动态信息系统中基于序贯三支决策的属性约简方法 Attribute Reduction Method Based on Sequential Three-way Decisions in Dynamic Information Systems 计算机科学, 2019, 46(6A): 120-123. |
[10] | 薛占熬, 韩丹杰, 吕敏杰, 赵丽平. 一种新的基于粒度重要度的三支决策模型 New Three-way Decisions Model Based on Granularity Importance Degree 计算机科学, 2019, 46(2): 236-241. https://doi.org/10.11896/j.issn.1002-137X.2019.02.036 |
[11] | 李艳, 张丽, 王雪静, 陈俊芬. 优势-等价关系下序贯三支决策的属性约简 Attribute Reduction for Sequential Three-way Decisions Under Dominance-Equivalence Relations 计算机科学, 2019, 46(2): 242-148. https://doi.org/10.11896/j.issn.1002-137X.2019.02.037 |
[12] | 郭豆豆, 姜春茂. 基于M-3WD的多阶段区域转化策略研究 Multi-stage Regional Transformation Strategy in Move-based Three-way Decisions Model 计算机科学, 2019, 46(10): 279-285. https://doi.org/10.11896/jsjkx.180801609 |
[13] | 徐健锋, 何宇凡, 刘斓. 三支决策代价目标函数的关系及推理研究 Relationship and Reasoning Study for Three-way Decision Cost Objective Functions 计算机科学, 2018, 45(6): 176-182. https://doi.org/10.11896/j.issn.1002-137X.2018.06.031 |
[14] | 陈玉金, 李续武, 邢瑞康. 基于证据理论的三支决策模型 Three-way Decisions Model Based on Evidence Theory 计算机科学, 2018, 45(6): 241-246. https://doi.org/10.11896/j.issn.1002-137X.2018.06.043 |
[15] | 薛占熬,辛现伟,袁艺林,吕敏杰. 基于直觉模糊可能性分布的三支决策模型的研究 Study on Three-way Decisions Based on Intuitionistic Fuzzy Probability Distribution 计算机科学, 2018, 45(2): 135-139. https://doi.org/10.11896/j.issn.1002-137X.2018.02.024 |
|