噪音特征对聚类内部有效性的影响

doi:10.11896／j.issn.1002-137X.2018.07.004

Abstract

Abstract: Internal validation measures of clustering are extremly essentialin clustering analysis,and they are used to evaluate the effect of clustering results and are indicators to find the optimal cluster number when the true situation of sample is unknown.Although a large number of studies focus on the performance of internal validation measures of clustering and have found that some measures perform better than others,they ignore the influence of noisy features existing in real data.Therefore,it may mislead the selection and application of internal validation measures of clustering.This study selected 10 clustering validation measures to determine the number of clusters of simulation datasets and real datasets,so as to analyze the influence of noisy features on internal validation choosing and clustering results.Results indicate that noisy features among dataset have impact on all internal validation indices of clustering but KL,CH and CCC,and accuracy of the clustering results will decrease along with the increase of noise.

Key words: Clustering accuracy, Internal validation, Noisy features, Number of clusters

CLC Number:

TP391

YANG Hu, FU Yu, FAN Dan. Influence of Noisy Features on Internal Validation of Clustering[J].Computer Science, 2018, 45(7): 22-30.

0
/ / Recommend

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: https://www.jsjkx.com/EN/10.11896／j.issn.1002-137X.2018.07.004

https://www.jsjkx.com/EN/Y2018/V45/I7/22

References

[1]LEE J M,SONNHAMMER E L.Genomic gene clustering ana-lysis of pathways in eukaryotes.Genome Research,2003,13(5):875-882.
[2]ZASLAVSKY L,CIUFO S,FEDOROV B,et al.Clusteringanalysis of proteins from microbial genomes at multiple levels of resolution.Bmc Bioinformatics,2016,17(8):545-552.
[3]LI X,HIPEL K W,DANG Y.An improved grey relational ana-lysis approach for panel data clustering.Oxford:Pergamon Press,Inc.2015.
[4]ARBELAITZ O,GURRUTXAGA I,MUGUERZA J,et al.An extensive comparative study of cluster validity indices.Pattern Recognition,2013,46(1):243-256.
[5]BEN-DAVID S,LUXBURG U V, P L D.A Sober Look at Clustering Stability.Lecture Notes in Computer Science,2006, 4005:5-19.
[6]SALEM S A,NANDI A K.Development of assessment criteria for clustering algorithms.Berlin:Springer-Verlag,2009.
[7]BOLSHAKOVA N,AZUAJE F,CUNNINGHAM P.A know-ledge-driven approach to cluster validity assessment.Bioinformatics,2005,21(10):2546-2547.
[8]YUE S,WANG J,WANG J,et al.A new validity index for eva-luating the clustering results by partitional clustering algorithms.Soft Computing,2016,20(3):1127-1138.
[9]CHAWLA N.Discovering Knowledge in Data:An Introduction to Data Mining.Publications of the American Statistical Association,2014,100(472):1465-1465.
[10]ZHAO Y,KARYPIS G.Evaluation of hierarchical clustering algorithms for document datasets∥Eleventh International Conference on Information & Knowledge Management.ACM,2002:515-524.
[11]LIU Y,LI Z,XIONG H,et al.Understanding of Internal Clustering Validation Measures∥IEEE,International Conference on Data Mining.IEEE,2011:911-916.
[12]GIANCARLO R,UTRO F.Algorithmic paradigms for stability-based cluster validity and model selection statistical methods,with applications to microarray data analysis.Theoretical Computer Science,2012,428(6):58-79.
[13]GURRUTXAGA I,MUGUERZA J,ARBELAITZ O.Towards a standard methodology to evaluate internal cluster validity indices.Pattern Recognition Letters,2011,32(3):505-515.
[14]JIANG D,TANG C,ZHANG A.Cluster analysis for gene expression data:a survey.IEEE Transactions on Knowledge & Data Engineering,2004,16(11):1370-1386.
[15]SMYTH C,COOMANS D,EVERINGHAM Y.Clustering noisy data in a reduced dimension space via multivariate regression trees.Pattern Recognition,2006,39(3):424-431.
[16]DUNNA＾ J C.Well-Separated Clusters and Optimal Fuzzy Partitions.Journal of Cybernetics,1974,4(1):95-104.
[17]CALIN′SKI T,HARABASZ J.A dendrite method for clusteranalysis.Communications in Statistics,1974,3(1):1-27.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Influence of Noisy Features on Internal Validation of Clustering

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 4

Metrics

Comments

Recommended 0

[1]	CUI Guo-nan, WANG Li-song, KANG Jie-xiang, GAO Zhong-jie, WANG Hui, YIN Wei. Fuzzy Clustering Validity Index Combined with Multi-objective Optimization Algorithm and Its Application [J]. Computer Science, 2021, 48(10): 197-203.
[2]	CHEN Jun-fen, ZHANG Ming, HE Qiang. Heuristically Determining Cluster Numbers Based NJW Spectral Clustering Algorithm [J]. Computer Science, 2018, 45(11A): 474-479.
[3]	ZHOU Shi-bing,XU Zhen-yuan,TANG Xu-qing. Comparative Study on Method for Determining Optimal Number of Clusters Based on Affinity Propagation Clustering [J]. Computer Science, 2011, 38(2): 225-228.
[4]	. [J]. Computer Science, 2007, 34(2): 207-210.