基于特征偏好的聚类研究

doi:10.11896/j.issn.1002-137X.2015.05.012

Abstract

Abstract: Traditional clustering methods,such as k-means and fuzzy c-means,do not generally distinguish different contributions or importance of data features to individual clusters,thus when facing high dimensional data,they often lead to lower clustering performance due to hardly considering the presence of high correlation or redundancy between features.In order to mitigate such adversity,with the introduction of the feature weights for each cluster in the clustering objective,we could automatically obtain not only the cluster-dependent weights but also the enhanced clustering performance.Though so,the feature weights obtained by an unsupervised clustering algorithm do not necessarily match the relative importance (or preferences) between the features as users expect.Thus this paper attempted to take advantage of actual preferences from users to design a clustering method which can reflect the feature preference.As a result,the proposed method not only extends the existing clustering methods with globally-weighted cluster-independent features to the one with locally-weighted cluster-dependent features but alos improves the clustering performance for feature preferences.

Key words: Clustering analysis,Feature preferences,Feature weighting,Cluster-dependent,Quadratic programming

FANG Ling and CHEN Song-can. Research on Clustering with Feature Preferences[J].Computer Science, 2015, 42(5): 57-61.

References

[1] Asuncion A,Newman D.UCI machine learning repository[Z].2007
[2] Wang J,Wang S T,Deng Z H.A novel text clustering algorithm based on feature weighting distance and soft subspace learning[J].Jisuanji Xuebao (Chinese Journal of Computers),2012,35(8):1655-1665
[3] Andrews J L,McNicholas P D.Variable Selection for Clustering and Classification [J].Journal of classification,2014,1(2):136-153
[4] Sun J,Zhao W,Xue J,et al.Clustering with feature order prefe-rences[J].Intelligent Data Analysis,2010,14(4):479-495
[5] Chen X,Ye Y,Xu X,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446
[6] Jain A K,Dubes R C.Algorithms for clustering data [M].Prentice-Hall,Inc.,1988
[7] Witten D M,Tibshirani R.A framework for feature selection in clustering [J].Journal of the American Statistical Association,2010,105(490)
[8] Banerjee A,Merugu S,Dhillon I S,et al.Clustering with Bregman divergences [J].The Journal of Machine Learning Research,2005,6:1705-1749
[9] Jain A K.Data clustering:50 years beyond K-means [J].Pattern Recognition Letters,2010,31(8):651-666
[10] Bezdek J C.Pattern recognition with fuzzy objective function algorithms[M].Kluwer Academic Publishers,1981
[11] Luo P,Zhan G,He Q,et al.On defining partition entropy by inequalities[J].IEEE Transactions on Information Theory,2007,53(9):3233-3239
[12] Liu Y,Jin R,Jain A K.Boostcluster:Boosting clustering by pairwise constraints[C]∥Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2007:450-459
[13] Gan G,Wu J.A convergence theorem for the fuzzy subspaceclustering (FSC) algorithm [J].Pattern Recognition,2008,41(6):1939-1947
[14] Shi J,Malik J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905
[15] Boyd S P,Vandenberghe L.Convex optimization[M].Cam-bridge university press,2004
[16] Wu M,Schlkopf B.A local learning approach for clustering[C]∥Advances in Neural Information Processing Systems.2006:1529-1536
[17] Bertsekas D P.Nonlinear programming(2nd Edition)[M].1999
[18] Strehl A,Ghosh J.Cluster ensembles--a knowledge reuseframework for combining multiple partitions[J].The Journal of Machine Learning Research,2003,3:583-617
[19] Reynolds D.Gaussian mixture models[M]∥Encyclopedia of Biometrics.Springer US,2009:659-663
[20] McLachlan G J,Peel D.Robust cluster analysis via mixtures of multivariate t-distributions [C]∥Advances in pattern recognition.Springer Berlin Heidelberg,1998:658-666
[21] Reed J W,Potok T E,Patton R M.A multi-agent system for distributed cluster analysis[C]∥Proceedings of Third International Workshop on Software Engineering for Large-Scale Multi-Agent Systems (SELMAS’04) Workshop in conjunction with the 26th International Conference on Software Engineering Edinburgh.Scotland,UK:IEEE,2004:152-155
[22] Coddington P D,Baillie C F.Parallel cluster algorithms [J].Nuclear Physics B-Proceedings Supplements,1991,20:76-79
[23] Makarenkov V,Legendre P.Optimal variable weighting for ultrametric and additive trees and K-means partitioning:Methods and software [J].Journal of Classification,2001,18(2):245-271
[24] Huang J Z,Ng M K,Rong H,et al.Automated variable weighting in k-means type clustering [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):657-668
[25] Tsai C Y,Chiu C C.Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm [J].Computational statistics & data analysis,2008,52(10):4658-4672
[26] Wolpert D H,Macready W G.No free lunch theorems for optimization [J].IEEE Transactions on Evolutionary Computation,1997,1(1):67-82
[27] Fu J,Chu S,Han Z,et al.Improved Genetic Algorithm Based on Variable Weighting FCM Clustering Algorithm[C]∥Procee-dings of the 9th International Symposium on Linear Drives for Industry Applications.Volume 2,Springer Berlin Heidelberg,2014:671-677
[28] Chen X,Ye Y,Xu X,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446
[29] Xiong C,Johnson D,Corso J J.Online active constraint selection for semi-supervised clustering[C]∥ECAI 2012 AIL Workshop.2012

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research on Clustering with Feature Preferences

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0