计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 65-70.doi: 10.11896/j.issn.1002-137X.2017.01.012

• 2016第六届中国数据挖掘会议 • 上一篇    下一篇

一种改进的多视图聚类集成算法

邓强,杨燕,王浩   

  1. 西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61170111,61572407,61134002),国家科技支撑计划课题(2015BAH19F02),四川省科技支撑计划项目(2014SZ0207)资助

Improved Multi-view Clustering Ensemble Algorithm

DENG Qiang, YANG Yan and WANG Hao   

  • Online:2018-11-13 Published:2018-11-13

摘要: 近年来,针对大数据的数据挖掘技术和机器学习算法研究变得日趋重要。在聚类领域,随着多视图数据的大量出现,多视图聚类已经成为了一类重要的聚类方法。然而,大多数现有的多视图聚类算法受算法参数设置、数据样本等影响,具有聚类结果不稳定、参数需要反复调节等缺点。基于多视图K-means算法和聚类集成技术,提出了一种改进的多视图聚类集成算法,其提高了聚类的准确性、鲁棒性和稳定性。其次,由于单机环境下的多视图聚类算法难以对海量的数据进行处理,结合分布式处理技术,实现了一种分布式的多视图并行聚类算法。实验证明,并行算法在处理大数据时的时间效率有很大提升,适合于大数据环境下的多视图聚类分析。

关键词: 多视图聚类,聚类集成,分布式计算,并行化

Abstract: In recent years,data mining and machine learning algorithms for big data become increasingly important.In the clustering,with the appearance of multi-view data,multi-view clustering has become an important clustering method.However,many existing multi-view clustering algorithms are easily affected by parameter setting and dataset itself,so the clustering results are usually unstable.To overcome this problem,we presented a new multi-view clustering ensemble algorithm based on the multi-view K-means clustering algorithm in this paper.This algorithm uses ensemble technique to improve the multi-view K-means algorithm performance,increasing the accuracy,robustness,and stability of clustering results.It is well known that one single computer cannot process too much data,because one computer has the limited computation resources.To improve the efficiency of multi-view clustering,we implemented a distributed multi-view clustering ensemble algorithm based on distributed processing technology.Experimental results show that the proposed approach has higher efficiency when processing large dataset,and it is suitable for multi-view clustering in big data environment.

Key words: Multi-view clustering,Clustering ensemble,Distributed Computation,Parallelization

[1] KUMAR A,DAUM H.A co-training approach for multi-view spectral clustering[C]∥Proceedings of the 28th International Conference on Machine Learning (ICML-11).2011:393-400.
[2] Bickel S,Scheffer T.Multi-View Clustering[C]∥ICDM.2004:19-26.
[3] KUMAR A,RAI P,DAUME H.Co-regularized multi-view spec-tral clustering[M]∥Advances in Neural Information Processing Systems.2011:1413-1421.
[4] CAI X,NIE F,HUANG H.Multi-view k-means clustering on big data[C]∥Proceedings of the Twenty-Third international Joint Conference on Artificial Intelligence.AAAI Press,2013:2598-2604.
[5] TZORTZIS G,LIKAS A.Kernel-based weighted multi-view clu-stering[C]∥Proceedings of the 12th IEEE International Con-ference on Data Mining (ICDM).2012:675-684.
[6] XIIE X,SUN S.Multi-view clustering ensembles[C]∥Procee-dings of the IEEE 2013 International Conference on Machine Learning and Cybernetics (ICMLC).2013:51-56.
[7] MIZAEI H.A novel multi-view agglomerative clustering algo-rithm based on ensemble of partitions on different views[C]∥ 2010 20th International Conference on Pattern Recognition (ICPR).2010:1007-1010.
[8] STREHL A,GHOSH J.Cluster ensembles--a knowledge reuse framework for combining multiple partitions[J].The Journal of Machine Learning Research,2003,3:583-617.
[9] IAM-ON N,BOONGOEN T,GARRETT S.Refining pairwise similarity matrix for cluster ensemble problem with cluster relations[M]∥Discovery Science.Springer Berlin Heidelberg,2008:222-233.
[10] DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[11] ZHAO W,MA H,HE Q.Parallel k-means clustering based on mapreduce[M]∥Cloud Computing.Springer Berlin Heidelberg,2009:674-679.
[12] CHEN W Y,SONG Y,BAI H,et al.Parallel spectral clustering in distributed systems[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(3):568-586.
[13] LU Wei-ming,DU Chen-yang, Wei Bao-gang,et al.Distributedaffinity propagation clustering based on map reduce[J].Journal of Computer Research and Development,2012, 49(8):1762-1772.(in Chinese) 鲁伟明,杜晨阳,魏宝刚,等.基于MapReduce的分布式近邻传播聚类算法[J].计算机研究与发展,2012,9(8):1762-1772.
[14] ZHAO Wei-dong,MA Hui-fang,FU Yan-xiang,et al.Research on Parallel k-means Algorithm Design Based on Hadoop Platform[J].Computer Science,2011,8(10):166-168.(in Chinese) 赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10):166-168.
[15] TANG Dong-ming.Affinity propagation clustering for big data based on Hadoop[J].Computer Engineering and Applications,2015,51(4):29-34.(in Chinese) 唐东明.基于Hadoop的仿射传播大数据聚类分析方法[J].计算机工程与应用,2015,51(4):29-34.
[16] AMINI M R,USUNIER N,GOUTTE C.Learning from multiple partially observed views- an application to multilingual text categorization[M]∥Advances in Neural Information Processing Systems (NIPS).2009:28-36.
[17] XIA R,PAN Y,DU L,et al.Robust multi-view spectral clustering via low-rank and sparse decomposition[C]∥AAAI Confe-rence on Artificial Intelligence.2014:2149-2155.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .