Computer Science ›› 2018, Vol. 45 ›› Issue (11): 292-297.doi: 10.11896/j.issn.1002-137X.2018.11.047

• Interdiscipline & Frontier • Previous Articles     Next Articles

Study on Parallel K-means Algorithm Based on CUDA

LIU Duan-yang, ZHENG Jiang-fan, SHEN Guo-jiang, LIU Zhi   

  1. (College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
  • Received:2017-10-30 Published:2019-02-25

Abstract: When k-means algorithm is confronted with large dataset,the computation time will grow exponentially with the increase of the dataset.In order to improve the computational performance of this algorithm,this paper designed a parallel k-means algorithm based on CUDA (Compute Unified Device Architecture) programming model,called GS_k-means.According to the parallel analysis of k-means algorithm,the global selection is used to determine whether the cluster of data is changed before distance calculation,thus reducing redundant computation.The calculation speed is accelerated by using the universal matrix multiplication in distance calculation.When the cluster center is updated,all data are grouped by cluster tag,and the data in the group are simply added,thus reducing the atomic memory operation andimproving the overall performance.The experimental results on KDDCUP99 datasetshow that on the condition of ensuring the accuracy of the experimental results,the improved algorithm the calculation speed and is five times faster than the classical GPUMiner algorithm.

Key words: k-means, CUDA, Parallel computation, Big data

CLC Number: 

  • TP391
[1]MACQUEEN J B.Some Methods for Classification and Analysis of MultiVariate Observations[C]∥Proceedings of Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297.
[2]LE V H,KIM S R.K-strings algorithm,anewapproach based on Kmeans[C]∥Conference on Research in Adaptive and Convergent Systems.ACM,2015:15-20.
[3]KUMAR G R,MANGATHAYARU N,NARASIMHA G.An improved k-Means Clustering algorithm for Intrusion Detection using Gaussian function[C]∥The International Conference on Engineering &Mis.ACM,2015:1-7.
[4]NVIDIA Corporation.CUDA Technology[OL].
[5]ZECHNER M,GRANITZER M.Accelerating K-Means on theGraphics Processor via CU-DA[C]∥First International Con-ference on Intensive Applications and Services.IEEE Computer Society,2009:7-15.
[6]LI Y,ZHAO K,CHU X,et al.Speeding up K-Means Algorithm by GPUs[C]∥International Conference on Computer and Information Technology.IEEE,2010:115-122.
[7]ZHONG S,LIN S,XU G,et al.The expansibility research of K-Means algorithm under the GPU[C]∥IEEE International Conference on Software Engineering and Service Science.IEEE,2017:734-737.
[8]HUANG P,LI X,YUAN B.A Parallel GPU-Based Approach to Clustering Very Fast Data Streams[C]∥ACM International on Conference on Information and Knowledge Management.ACM,2015:23-32.
[9]WU J,HONG B.An Efficient k-Means Algorithm on CUDA[C]∥ IEEE International Symposiumon Parallel and Distributed Processing Workshops and Phd Forum.IEEE Computer Society,2011:1740-1749.
[10]HUI Z N.Multidimensional data clustering alogorithm research and GPU acceleration based Global K-means[D].Xi’an:Xidian University,2012.(in Chinees) 惠转妮.基于Global K-means的多维数据聚类算法研究及其GPU加速[D].西安:西安电子科技大学,2012.
[11]KAKOOEI M,SHAHHOSEINI H S.A parallel k-means clustering initial center selection and dynamic center correction on GPU[C]∥Electrical Engineering.IEEE,2015:20-25.
[12]BHIMANI J,LEESER M,MI N.Accelerating K-Means clustering with parallel implementations and GPU computing[C]∥High Performance Extreme Computing Conference.IEEE,2015:1-6.
[13]BAYDOUN M,DAWI M,GHAZIRI H.Enhanced parallel implementation of the K-Means clustering algorithm[C]∥International Conference on Advances in Computational TOOLS for Engineering Applications.IEEE,2016:7-11.
[14]ABBASITABAR H,SAMAVATIAN M H,SA-RBAZI-AZAD H.ASHA:An Adaptive Shared-Memory Sharing Architecture for Multi-Programmed GPUs[J].Microprocessors & Microsystems,2016,46:264-273.
[15]HETTICH S,BAY S D.KDD cup 1999 data[EB/OL].
[16]WANG Q,MEGALOOIKONOMOU V.A clustering algorithm for intrusion detection[J].Proc Spie,2008,5812(5812):31-38.
[17]FANG W,LAU K K,LUM,et al.Parallel datamining on gra-phics processors[OL].http://11130.126.143.33/sites/default/files/papers/358/gpuminer.pdf.
[1] YE Ya-zhen, LIU Guo-hua, ZHU Yang-yong. Two-step Authorization Pattern of Data Product Circulation [J]. Computer Science, 2021, 48(1): 119-124.
[2] ZHAO Hui-qun, WU Kai-feng. Big Data Valuation Algorithm [J]. Computer Science, 2020, 47(9): 110-116.
[3] MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122.
[4] WANG Liang, ZHOU Xin-zhi, YNA Hua. Real-time SIFT Algorithm Based on GPU [J]. Computer Science, 2020, 47(8): 105-111.
[5] XU Shou-kun, NI Chu-han, JI Chen-chen, LI Ning. Image Caption of Safety Helmets Wearing in Construction Scene Based on YOLOv3 [J]. Computer Science, 2020, 47(8): 233-240.
[6] CHAO Le-men. Course Design and Redesign for Introduction to Data Science [J]. Computer Science, 2020, 47(7): 1-7.
[7] GU Rong-Jie, WU Zhi-ping and SHI Huan. New Approach for Graded and Classified Cloud Data Access Control for Public Security Based on TFR Model [J]. Computer Science, 2020, 47(6A): 400-403.
[8] LI Yong. Stock Investment Strategy Development Based on BigQuant Platform [J]. Computer Science, 2020, 47(6A): 612-615.
[9] GE Yu-ming, HAN Qing-wen, WANG Miao-qiong, ZENG Ling-qiu, LI Lu. Application Mode and Challenges of Vehicular Big Data [J]. Computer Science, 2020, 47(6): 59-65.
[10] LIU Ji-qin, SHI Kai-quan. Big Data Decomposition-Fusion and Its Intelligent Acquisition [J]. Computer Science, 2020, 47(6): 66-73.
[11] ZENG Wei-liang, WU Miao-sen, SUN Wei-jun, XIE Sheng-li. Comprehensive Review of Autonomous Taxi Dispatching Systems [J]. Computer Science, 2020, 47(5): 181-189.
[12] ZHONG Ya,GUO Yuan-bo,LIU Chun-hui,LI Tao. User Attributes Profiling Method and Application in Insider Threat Detection [J]. Computer Science, 2020, 47(3): 292-297.
[13] RAO Meng,MIAO Duo-qian,LUO Sheng. Rough Uncertain Image Segmentation Method [J]. Computer Science, 2020, 47(2): 72-75.
[14] JIAO Yang, YANG Chuan-ying, SHI Bao. Relevance Feedback Method Based on SVM in Shoeprint Images Retrieval [J]. Computer Science, 2020, 47(11A): 244-247.
[15] YAO Li-shuang, LIU Dan, PEI Zuo-fei, WANG Yun-feng. Real-time Network Traffic Prediction Model Based on EMD and Clustering [J]. Computer Science, 2020, 47(11A): 316-320.
Full text



[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[3] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[4] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[5] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[6] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[7] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[8] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[9] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[10] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .