计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 65-70.doi: 10.11896/j.issn.1002-137X.2017.01.012

• 2016第六届中国数据挖掘会议 • 上一篇    下一篇

一种改进的多视图聚类集成算法

邓强,杨燕,王浩   

  1. 西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61170111,61572407,61134002),国家科技支撑计划课题(2015BAH19F02),四川省科技支撑计划项目(2014SZ0207)资助

Improved Multi-view Clustering Ensemble Algorithm

DENG Qiang, YANG Yan and WANG Hao   

  • Online:2018-11-13 Published:2018-11-13

摘要: 近年来,针对大数据的数据挖掘技术和机器学习算法研究变得日趋重要。在聚类领域,随着多视图数据的大量出现,多视图聚类已经成为了一类重要的聚类方法。然而,大多数现有的多视图聚类算法受算法参数设置、数据样本等影响,具有聚类结果不稳定、参数需要反复调节等缺点。基于多视图K-means算法和聚类集成技术,提出了一种改进的多视图聚类集成算法,其提高了聚类的准确性、鲁棒性和稳定性。其次,由于单机环境下的多视图聚类算法难以对海量的数据进行处理,结合分布式处理技术,实现了一种分布式的多视图并行聚类算法。实验证明,并行算法在处理大数据时的时间效率有很大提升,适合于大数据环境下的多视图聚类分析。

关键词: 多视图聚类,聚类集成,分布式计算,并行化

Abstract: In recent years,data mining and machine learning algorithms for big data become increasingly important.In the clustering,with the appearance of multi-view data,multi-view clustering has become an important clustering method.However,many existing multi-view clustering algorithms are easily affected by parameter setting and dataset itself,so the clustering results are usually unstable.To overcome this problem,we presented a new multi-view clustering ensemble algorithm based on the multi-view K-means clustering algorithm in this paper.This algorithm uses ensemble technique to improve the multi-view K-means algorithm performance,increasing the accuracy,robustness,and stability of clustering results.It is well known that one single computer cannot process too much data,because one computer has the limited computation resources.To improve the efficiency of multi-view clustering,we implemented a distributed multi-view clustering ensemble algorithm based on distributed processing technology.Experimental results show that the proposed approach has higher efficiency when processing large dataset,and it is suitable for multi-view clustering in big data environment.

Key words: Multi-view clustering,Clustering ensemble,Distributed Computation,Parallelization

[1] KUMAR A,DAUM H.A co-training approach for multi-view spectral clustering[C]∥Proceedings of the 28th International Conference on Machine Learning (ICML-11).2011:393-400.
[2] Bickel S,Scheffer T.Multi-View Clustering[C]∥ICDM.2004:19-26.
[3] KUMAR A,RAI P,DAUME H.Co-regularized multi-view spec-tral clustering[M]∥Advances in Neural Information Processing Systems.2011:1413-1421.
[4] CAI X,NIE F,HUANG H.Multi-view k-means clustering on big data[C]∥Proceedings of the Twenty-Third international Joint Conference on Artificial Intelligence.AAAI Press,2013:2598-2604.
[5] TZORTZIS G,LIKAS A.Kernel-based weighted multi-view clu-stering[C]∥Proceedings of the 12th IEEE International Con-ference on Data Mining (ICDM).2012:675-684.
[6] XIIE X,SUN S.Multi-view clustering ensembles[C]∥Procee-dings of the IEEE 2013 International Conference on Machine Learning and Cybernetics (ICMLC).2013:51-56.
[7] MIZAEI H.A novel multi-view agglomerative clustering algo-rithm based on ensemble of partitions on different views[C]∥ 2010 20th International Conference on Pattern Recognition (ICPR).2010:1007-1010.
[8] STREHL A,GHOSH J.Cluster ensembles--a knowledge reuse framework for combining multiple partitions[J].The Journal of Machine Learning Research,2003,3:583-617.
[9] IAM-ON N,BOONGOEN T,GARRETT S.Refining pairwise similarity matrix for cluster ensemble problem with cluster relations[M]∥Discovery Science.Springer Berlin Heidelberg,2008:222-233.
[10] DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[11] ZHAO W,MA H,HE Q.Parallel k-means clustering based on mapreduce[M]∥Cloud Computing.Springer Berlin Heidelberg,2009:674-679.
[12] CHEN W Y,SONG Y,BAI H,et al.Parallel spectral clustering in distributed systems[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(3):568-586.
[13] LU Wei-ming,DU Chen-yang, Wei Bao-gang,et al.Distributedaffinity propagation clustering based on map reduce[J].Journal of Computer Research and Development,2012, 49(8):1762-1772.(in Chinese) 鲁伟明,杜晨阳,魏宝刚,等.基于MapReduce的分布式近邻传播聚类算法[J].计算机研究与发展,2012,9(8):1762-1772.
[14] ZHAO Wei-dong,MA Hui-fang,FU Yan-xiang,et al.Research on Parallel k-means Algorithm Design Based on Hadoop Platform[J].Computer Science,2011,8(10):166-168.(in Chinese) 赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10):166-168.
[15] TANG Dong-ming.Affinity propagation clustering for big data based on Hadoop[J].Computer Engineering and Applications,2015,51(4):29-34.(in Chinese) 唐东明.基于Hadoop的仿射传播大数据聚类分析方法[J].计算机工程与应用,2015,51(4):29-34.
[16] AMINI M R,USUNIER N,GOUTTE C.Learning from multiple partially observed views- an application to multilingual text categorization[M]∥Advances in Neural Information Processing Systems (NIPS).2009:28-36.
[17] XIA R,PAN Y,DU L,et al.Robust multi-view spectral clustering via low-rank and sparse decomposition[C]∥AAAI Confe-rence on Artificial Intelligence.2014:2149-2155.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!