计算机科学 ›› 2018, Vol. 45 ›› Issue (11): 292-297.doi: 10.11896/j.issn.1002-137X.2018.11.047
刘端阳, 郑江帆, 沈国江, 刘志
LIU Duan-yang, ZHENG Jiang-fan, SHEN Guo-jiang, LIU Zhi
摘要: k-means算法在面对大规模数据集时,计算时间将随着数据集的增大而成倍增长。为了提升算法的运算性能,设计了一种基于CUDA(Compute Unified Device Architecture)编程模型的并化行k-means算法,即GS_k-means算法。对k-means算法进行了并行化分析,在距离计算前,运用全局选择判断数据所属聚簇是否改变,减少冗余计算;在距离计算时,采用通用矩阵乘加速,加快计算速度;在簇中心点更新时,将所有数据按照簇标签排序分组,将组内数据简单相加,减少原子内存操作,从而提高整体性能。使用KDDCUP99数据集对改进算法进行实验,结果表明,在保证实验结果的准确性的情况下,改进算法加快了计算速度,与经典的GPUMiner算法相比加速比提升5倍。
中图分类号:
[1]MACQUEEN J B.Some Methods for Classification and Analysis of MultiVariate Observations[C]∥Proceedings of Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297. [2]LE V H,KIM S R.K-strings algorithm,anewapproach based on Kmeans[C]∥Conference on Research in Adaptive and Convergent Systems.ACM,2015:15-20. [3]KUMAR G R,MANGATHAYARU N,NARASIMHA G.An improved k-Means Clustering algorithm for Intrusion Detection using Gaussian function[C]∥The International Conference on Engineering &Mis.ACM,2015:1-7. [4]NVIDIA Corporation.CUDA Technology[OL].http://www.nvidia.com/CUDA. [5]ZECHNER M,GRANITZER M.Accelerating K-Means on theGraphics Processor via CU-DA[C]∥First International Con-ference on Intensive Applications and Services.IEEE Computer Society,2009:7-15. [6]LI Y,ZHAO K,CHU X,et al.Speeding up K-Means Algorithm by GPUs[C]∥International Conference on Computer and Information Technology.IEEE,2010:115-122. [7]ZHONG S,LIN S,XU G,et al.The expansibility research of K-Means algorithm under the GPU[C]∥IEEE International Conference on Software Engineering and Service Science.IEEE,2017:734-737. [8]HUANG P,LI X,YUAN B.A Parallel GPU-Based Approach to Clustering Very Fast Data Streams[C]∥ACM International on Conference on Information and Knowledge Management.ACM,2015:23-32. [9]WU J,HONG B.An Efficient k-Means Algorithm on CUDA[C]∥ IEEE International Symposiumon Parallel and Distributed Processing Workshops and Phd Forum.IEEE Computer Society,2011:1740-1749. [10]HUI Z N.Multidimensional data clustering alogorithm research and GPU acceleration based Global K-means[D].Xi’an:Xidian University,2012.(in Chinees) 惠转妮.基于Global K-means的多维数据聚类算法研究及其GPU加速[D].西安:西安电子科技大学,2012. [11]KAKOOEI M,SHAHHOSEINI H S.A parallel k-means clustering initial center selection and dynamic center correction on GPU[C]∥Electrical Engineering.IEEE,2015:20-25. [12]BHIMANI J,LEESER M,MI N.Accelerating K-Means clustering with parallel implementations and GPU computing[C]∥High Performance Extreme Computing Conference.IEEE,2015:1-6. [13]BAYDOUN M,DAWI M,GHAZIRI H.Enhanced parallel implementation of the K-Means clustering algorithm[C]∥International Conference on Advances in Computational TOOLS for Engineering Applications.IEEE,2016:7-11. [14]ABBASITABAR H,SAMAVATIAN M H,SA-RBAZI-AZAD H.ASHA:An Adaptive Shared-Memory Sharing Architecture for Multi-Programmed GPUs[J].Microprocessors & Microsystems,2016,46:264-273. [15]HETTICH S,BAY S D.KDD cup 1999 data[EB/OL].http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [16]WANG Q,MEGALOOIKONOMOU V.A clustering algorithm for intrusion detection[J].Proc Spie,2008,5812(5812):31-38. [17]FANG W,LAU K K,LUM,et al.Parallel datamining on gra-phics processors[OL].http://11130.126.143.33/sites/default/files/papers/358/gpuminer.pdf. |
[1] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[2] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[3] | 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157 |
[4] | 汪晋, 刘江. 基于GPU的并行DILU预处理技术 GPU-based Parallel DILU Preconditioning Technique 计算机科学, 2022, 49(6): 108-118. https://doi.org/10.11896/jsjkx.210300259 |
[5] | 孙轩, 王焕骁. 政务大数据安全防护能力建设:基于技术和管理视角的探讨 Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives 计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010 |
[6] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[7] | 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究 Research on Big Data Governance for Science and Technology Forecast 计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207 |
[8] | 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳. 面向大数据分析的智能交互向导系统 Smart Interactive Guide System for Big Data Analytics 计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083 |
[9] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[10] | 傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文. 一种面向构件化并行应用程序的性能骨架分析方法 Performance Skeleton Analysis Method Towards Component-based Parallel Applications 计算机科学, 2021, 48(6): 1-9. https://doi.org/10.11896/jsjkx.201200115 |
[11] | 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化 Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform 计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051 |
[12] | 冯凯, 马鑫玉. (n,k)-冒泡排序网络的子网络可靠性 Subnetwork Reliability of (n,k)-bubble-sort Networks 计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139 |
[13] | 王雪岑, 张昱, 刘迎婕, 于戈. 基于表示学习的在线学习交互质量评价方法 Evaluation of Quality of Interaction in Online Learning Based on Representation Learning 计算机科学, 2021, 48(2): 207-211. https://doi.org/10.11896/jsjkx.201000042 |
[14] | 滕建, 滕飞, 李天瑞. 基于3D卷积和LSTM编码解码的出行需求预测 Travel Demand Forecasting Based on 3D Convolution and LSTM Encoder-Decoder 计算机科学, 2021, 48(12): 195-203. https://doi.org/10.11896/jsjkx.210400022 |
[15] | 张育龙, 王强, 陈明康, 孙静涛. 图像去雨算法在云物联网应用中的研究综述 Survey of Intelligent Rain Removal Algorithms for Cloud-IoT Systems 计算机科学, 2021, 48(12): 231-242. https://doi.org/10.11896/jsjkx.201000055 |
|