Computer Science ›› 2021, Vol. 48 ›› Issue (1): 136-144.doi: 10.11896/jsjkx.200700213

• Database & Big Data & Data Science • Previous Articles     Next Articles

Three-way Filtering Algorithm of Basic Clustering Based on Differential Measurement

LIANG Wei1,2, DUAN Xiao-dong1, XU Jian-feng1,3,4   

  1. 1 School of Software,Nanchang University,Nanchang 330047,China
    2 School of Software Engineering,South China University of Technology,Guangzhou 510006,China
    3 College of Electronics and Information Engineering,Tongji University,Shanghai 201804,China
    4 Tellhow Software Co.,LTD,Nanchang 330096,China
  • Received:2020-07-31 Revised:2020-08-30 Online:2021-01-15 Published:2021-01-15
  • About author:LIANG Wei,born in 1993,Ph.D candidate,is a student member of China Computer Federation.His main research interests include machine lear-ning,granular computing,three-way decision and ensemble clustering.
    XU Jian-feng,born in 1973,Ph.D candidate,professor,is a member of China Computer Federation.His main research interests include data mining,rough set,three-way decision and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61763031) and Jiangxi Provincial Natural Science Foundation(20202BAB202018).

Abstract: The pre-processing of basic clustering members is an important research step in the ensemble clustering algorithm.Numerous studies have shown that the difference in the set of basic clustering members affects the performance of the ensemble clustering.The current ensemble clustering research revolves around the generation of basic clustering and the integration of basic clustering,while the differential measurement and optimization of basic clustering members are not perfect.Based on Jaccard's similarity,this study proposes a measurement for the differential of basic clustering members and constructs a differential three-way filtering method for basic clustering members by introducing the three-way decisions idea.This method first sets the initial thresholds α(0) and β(0) of the three-way decisions for basic clustering members and then calculates the differential of each basic clustering member to implement the three-way decisions.Its decision strategy is:when the differential metric of the basic clustering member is less than the specified threshold α(0),the basic clustering member will be deleted; when the differential metric of the basic clustering member is greater than the specified threshold β(0),the basic clustering member will be retained; and when the differential metric of the basic clustering member is greater than α(0)and less than β(0),the basic clustering member will be added into the boundary domain of the three-way decisions,and boundary domains will be further judged by the three-way decisions with new thresholds.After completing a round of the three decisions,the algorithm recalculates thresholds of the three-way decisions and remakes the three-way decisions on boundary domains of the three-way decisions remained in the last round until no basic clustering member is added to boundary domains of the three-way decisions or the specified number of iterations is reached.The comparative experiment shows that the differential measurement three-way filtering method for basic clustering can effectively improve the performance of ensemble clustering.

Key words: Basic clustering filtering, Three-way decision, Three-way optimization, Clustering ensemble, Differential measurement

CLC Number: 

  • TP18
[1] HUANG D,LAI J H,WANG C D.Combining multiple cluste-rings via crowd agreement estimation and multi-granularity link analysis[J].Neurocomputing,2015,170:240-250.
[2] ZHOU Z H.Ensemble Methods-Foundations and Algorithms [M].Taylor&Francis,2013,81(3):470-470.
[3] TOPCHY A,JAIN A K,PUNCH W.Clustering ensembles:models of consensus and weak partitions[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2005,27(12):1866-1881.
[4] STREHL A,GHOSH J.Cluster ensembles:A knowledge reuse framework for combining multiple partitions.[J].Journal of Machine Learning Research,2002,3(12):583-617.
[5] FRED A L,JAIN A K.Combining multiple clusterings using evi-dence accumulation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(6):835-850.
[6] FERN X Z,BRODLEY C E.Random projection for high dimensional data clustering:A cluster ensemble approach[C]//Proceedings of 20th International Conference on Machine Learning.2003:186-193.
[7] APPROACH C E,FERN X Z,BRODLEY C E.Random Projection for High Dimensional Data Clustering[C]//Twentieth International Conference on International Conference on Machine Learning.AAAI Press,2003.
[8] MINAEIBIDGOLI B,TOPCHY A,PUNCH W F.Ensembles of partitions via data resampling[C]//International Conference on Information Technology:Coding & Computing.IEEE Computer Society,2004.
[9] DUDOIT S,FRIDLYAND J.Bagging to improve the accuracy of a clustering procedure[J].Bioinformatics,2003,19(9):1090-1099.
[10] YANG Y,JIANG J.Hybrid sampling-based clustering ensemble with global and local constitutions[J].IEEE Transactions on Neural Networks and Learning Systems,2016,27(5):952-965.
[11] ZHOU P,DU L,SHI L,Wang H,et al.Learning a robust consensus matrix for clustering ensemble via kullback-leibler divergence minimization[C]//Proc.the 25th International Joint Conference on Artificial Intelligence.2015.
[12] YU Z,LUO P,YOU J,et al.Incremental semi-supervised clustering ensemble for high dimensional data clustering[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(3):701-714.
[13] YU Z,LI L,LIU J,et al.Adaptive noise immune cluster ensemble using affinity propagation[J].IEEE Transactions on Know-ledge and Data Engineering,2015,27(12):3176-3189.
[14] FROUZAN R,SAMAD N,HAMID P,et al.Dibversity Based Cluster Weighting In Cluster Ensemble:An Information Theory Approach.[J].Artificial Intelligence Review,2019,52(2):1341-1368.
[15] WANG T. CA-Tree:A hierarchical structure for efficient and scalable coassociation-based cluster ensembles[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2011,41(3):686-698.
[16] TUMER K,AGOGINO A K.Ensemble clustering with voting active clusters[J].Pattern Recognition Letters,2008,29(14):1947-1953.
[17] HUANG D,WANG C D,LAI J H.Locally Weighted Ensemble Clustering[J].IEEE Transactions on Cybernetics,2016,48(5):1460-1473.
[18] HONG Y,YUN C,PAWAN L,et al.A three-way cluster ensemble approach for large-scale data[J].International Journal of Approximate Reasoning,2019,115:32-49.
[19] KANG Q,LIU S Y,ZHOU M C,et al.A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence[J].Knowledge Based Systems,2016,104(Jul):156-164.
[20] LIANG W,ZHANG Y J,XU J F,et al.Optimization of Basic Clustering for Ensemble Clustering:An Information-Theoretic Perspective[J].IEEE Access,2019,7:179048-179062.
[21] HUANG D,WANG C,PENG H,et al.Enhanced ensemble clus-tering via fast propagation of cluster-wise similarities[J].IEEE Trans.Syst.Man,Cybern.,Syst.2019,11:1-12.
[22] PARVIN H AND MINAEI-BIDGOLI B.A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm[J].Pattern Anal.Appl.,2015,18(1):87-112.
[23] SONG J H.Research on clustering integration algorithm [D].Harbin:Harbin Engineering University,2015.
[24] NIWATTANAKUL S,SINGTHONGCHAI J,NAENUDO-RNE,et al.Using of Jaccard Coefficient for Keywords Similarity[C]//Iaeng International Conference on Internet Computing & Web Services.International Association of Engineers,2013.
[25] IAM-ON N,BOONGEON T,GARRETT S,et al.A Link-Based Cluster Ensemble Approach for Categorical Data Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(3):413-425.
[26] LUO H L,KONG F S,LI Y X.An Analysis of Diversity Mea-sures in Clustering Ensembles[J].Chinese Journal of Compu-ters,2007,30(8):1315-1324.
[27] NATTHAKAN I,GARRETT S.LinkCluE:A MATLAB pac-kage for link based cluster ensembles[J].Stat.Softw.,2010,36(9):1-36.
[28] PARVIN H,MINAEI-BIDGOLI B.A clustering ensembleframework based on elite selection of weighted clusters[J].Adv.Data Anal.Classification,2013,7(2):181-208.
[29] YU Z,LUO P,YOU J,et al.Incremental semi-supervised clustering ensemble for high dimensional data clustering[J].IEEE Trans.Knowl.Data Eng.2016,28(3):701-714.
[30] FERN X,BRODLEY C.Solving cluster ensemble problems by bipartite graph partitioning[C]//Proc.Int.Conf.Mach.Learn.,2004:36.
[31] DOMENICONI C,AL-RAZGAN M.Weighted cluster ensem-bles:Methods and analysis[J].ACM Trans.Knowl.Discovery Data,2009:2-17.
[32] HUANG D,LAI J,WANG C.Robust ensemble clustering using probability trajectories[J].IEEE Trans.Knowl.Data Eng.,2016,28(5):1312-1326.
[33] GREENE D,TSYMBAL A,BOLSHAKOVA N,et al.Ensemble Clustering in Medical Diagnostics[C]//17th IEEE Symposium on Computer-Based Medical Systems,2004(CBMS 2004).IEEE,2004.
[34] HADJITODOROV S T,KUNCHEVA L I,TODOROVA L P. Moderate diversity for better cluster ensembles[J].Information Fusion,2006,7(3):268-275.
[35] YAO Y.Decision-theoretic rough set models[C]//International Conference on Rough Sets and Knowledge Technology.Springer-Verlag,2007:1-12.
[36] QIAN Y H,ZHANG H,SANG Y L,et al.Multi-granulation decision theoretic rough sets[J].International Journal of Approximate Reasoning,2014,55(1):225-237.
[37] MIAO D,XU F,YAO Y,et al.Set theory description of particle calculation[J].Journal of Computer,2012,35 (2):351-363.
[38] ABUALIGAH L M,KHADER A T,AL-BETAR M A,et al.Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering[J].Expert Systems with Applications,2017,84:24-36.
[39] STREHL A,GHOSH J. Cluster ensembles:A knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2003,3(12):583-617.
[40] LU Z,PENG Y,IP H H S.Combining multiple clusterings using fast simulated annealing.[J].Pattern Recognition Letters,2011,32(15):1956-1961.
[1] XUE Zhan-ao, ZHANG Min, ZHAO Li-ping, LI Yong-xiang. Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation [J]. Computer Science, 2021, 48(1): 157-166.
[2] CHEN Yu-jin, XU Ji-hui, SHI Jia-hui, LIU Yu. Three-way Decision Models Based on Intuitionistic Hesitant Fuzzy Sets and Its Applications [J]. Computer Science, 2020, 47(8): 144-150.
[3] SHAO Chao and MA Jin-Jia. Selective Clustering Ensemble Based on Xie-Beni Index [J]. Computer Science, 2020, 47(6A): 457-460.
[4] XIANG Wei, WANG Xin-wei. Imbalance Data Classification Based on Model of Multi-class Neighbourhood Three-way Decision [J]. Computer Science, 2020, 47(5): 103-109.
[5] LI Yan, ZHANG Li, CHEN Jun-fen. Attribute Reduction Method Based on Sequential Three-way Decisions in Dynamic Information Systems [J]. Computer Science, 2019, 46(6A): 120-123.
[6] XUE Zhan-ao, HAN Dan-jie, LV Min-jie, ZHAO Li-ping. New Three-way Decisions Model Based on Granularity Importance Degree [J]. Computer Science, 2019, 46(2): 236-241.
[7] LI Yan, ZHANG Li, WANG Xue-jing, CHEN Jun-fen. Attribute Reduction for Sequential Three-way Decisions Under Dominance-Equivalence Relations [J]. Computer Science, 2019, 46(2): 242-148.
[8] GUO Dou-dou, JIANG Chun-mao. Multi-stage Regional Transformation Strategy in Move-based Three-way Decisions Model [J]. Computer Science, 2019, 46(10): 279-285.
[9] XU Jian-feng, HE Yu-fan, LIU Lan. Relationship and Reasoning Study for Three-way Decision Cost Objective Functions [J]. Computer Science, 2018, 45(6): 176-182.
[10] CHEN Yu-jin, LI Xu-wu, XING Rui-kang. Three-way Decisions Model Based on Evidence Theory [J]. Computer Science, 2018, 45(6): 241-246.
[11] XUE Zhan-ao, XIN Xian-wei, YUAN Yi-lin and LV Min-jie. Study on Three-way Decisions Based on Intuitionistic Fuzzy Probability Distribution [J]. Computer Science, 2018, 45(2): 135-139.
[12] ZHANG Gang-qiang, LIU Qun, JI Liang-hao. Multi-granularity Sentiment Classification Method Based on Sequential Three-way Decisions [J]. Computer Science, 2018, 45(12): 153-159.
[13] YANG Xin, LI Tian-rui, LIU Dun, FANG Yu, WANG Ning. Generalized Sequential Three-way Decisions Approach Based on Decision-theoretic Rough Sets [J]. Computer Science, 2018, 45(10): 1-5.
[14] XING Ying, LI De-yu, WANG Su-ge. Cost-sensitive Sequential Three-way Decision Making Method [J]. Computer Science, 2018, 45(10): 6-10.
[15] REN Rui-si, WEI Ling, QI Jian-jun. Rules Acquisition on Three-way Class Contexts [J]. Computer Science, 2018, 45(10): 21-26.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] HAN Kui-kui, XIE Zai-peng and LV Xin. Fog Computing Task Scheduling Strategy Based on Improved Genetic Algorithm[J]. Computer Science, 2018, 45(4): 137 -142 .
[3] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[4] QU Zhong and ZHAO Cong-mei. Anti-occlusion Adaptive-scale Object Tracking Algorithm[J]. Computer Science, 2018, 45(4): 296 -300 .
[5] HUANG Dong-mei, DU Yan-ling, HE Qi, SUI Hong-yun, LI Yao. Marine Monitoring Data Replica Layout Strategy Based on Multiple Attribute Optimization[J]. Computer Science, 2018, 45(6): 72 -75 .
[6] LIU Jing-wei, LIU Jing-ju, LU Yu-liang, YANG Bin, ZHU Kai-long. Optimal Defense Strategy Selection Method Based on Network Attack-Defense Game Model[J]. Computer Science, 2018, 45(6): 117 -123 .
[7] CHENG Jing, ZHANG Tao, WANG Tao, DONG Zhan-wei. Graphic Complexity-based Prioritizing Technique for Regression Testing of Mobile Navigation Service[J]. Computer Science, 2018, 45(6): 141 -144 .
[8] WANG Zhi, WANG Jian-jun, WANG Wen-dong. Matrix Completion Algorithm Based on Subspace Thresholding Pursuit[J]. Computer Science, 2018, 45(6): 193 -196 .
[9] ZHANG Jing, ZHOU An-min, LIU Liang, JIA Peng and LIU Lu-ping. Review of Crash Exploitability Analysis Methods[J]. Computer Science, 2018, 45(5): 5 -14 .
[10] YAO Han-bing, XING Na-na, ZHOU Jun-wei and LI Yong-hua. Study on Secure Retrieval Scheme over Encrypted Data Supporting Result Ranking[J]. Computer Science, 2018, 45(5): 123 -130 .