Computer Science ›› 2018, Vol. 45 ›› Issue (11): 292-297.doi: 10.11896/j.issn.1002-137X.2018.11.047

• Interdiscipline & Frontier • Previous Articles     Next Articles

Study on Parallel K-means Algorithm Based on CUDA

LIU Duan-yang, ZHENG Jiang-fan, SHEN Guo-jiang, LIU Zhi   

  1. (College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
  • Received:2017-10-30 Published:2019-02-25

Abstract: When k-means algorithm is confronted with large dataset,the computation time will grow exponentially with the increase of the dataset.In order to improve the computational performance of this algorithm,this paper designed a parallel k-means algorithm based on CUDA (Compute Unified Device Architecture) programming model,called GS_k-means.According to the parallel analysis of k-means algorithm,the global selection is used to determine whether the cluster of data is changed before distance calculation,thus reducing redundant computation.The calculation speed is accelerated by using the universal matrix multiplication in distance calculation.When the cluster center is updated,all data are grouped by cluster tag,and the data in the group are simply added,thus reducing the atomic memory operation andimproving the overall performance.The experimental results on KDDCUP99 datasetshow that on the condition of ensuring the accuracy of the experimental results,the improved algorithm the calculation speed and is five times faster than the classical GPUMiner algorithm.

Key words: Big data, CUDA, k-means, Parallel computation

CLC Number: 

  • TP391
[1]MACQUEEN J B.Some Methods for Classification and Analysis of MultiVariate Observations[C]∥Proceedings of Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297.
[2]LE V H,KIM S R.K-strings algorithm,anewapproach based on Kmeans[C]∥Conference on Research in Adaptive and Convergent Systems.ACM,2015:15-20.
[3]KUMAR G R,MANGATHAYARU N,NARASIMHA G.An improved k-Means Clustering algorithm for Intrusion Detection using Gaussian function[C]∥The International Conference on Engineering &Mis.ACM,2015:1-7.
[4]NVIDIA Corporation.CUDA Technology[OL].http://www.nvidia.com/CUDA.
[5]ZECHNER M,GRANITZER M.Accelerating K-Means on theGraphics Processor via CU-DA[C]∥First International Con-ference on Intensive Applications and Services.IEEE Computer Society,2009:7-15.
[6]LI Y,ZHAO K,CHU X,et al.Speeding up K-Means Algorithm by GPUs[C]∥International Conference on Computer and Information Technology.IEEE,2010:115-122.
[7]ZHONG S,LIN S,XU G,et al.The expansibility research of K-Means algorithm under the GPU[C]∥IEEE International Conference on Software Engineering and Service Science.IEEE,2017:734-737.
[8]HUANG P,LI X,YUAN B.A Parallel GPU-Based Approach to Clustering Very Fast Data Streams[C]∥ACM International on Conference on Information and Knowledge Management.ACM,2015:23-32.
[9]WU J,HONG B.An Efficient k-Means Algorithm on CUDA[C]∥ IEEE International Symposiumon Parallel and Distributed Processing Workshops and Phd Forum.IEEE Computer Society,2011:1740-1749.
[10]HUI Z N.Multidimensional data clustering alogorithm research and GPU acceleration based Global K-means[D].Xi’an:Xidian University,2012.(in Chinees) 惠转妮.基于Global K-means的多维数据聚类算法研究及其GPU加速[D].西安:西安电子科技大学,2012.
[11]KAKOOEI M,SHAHHOSEINI H S.A parallel k-means clustering initial center selection and dynamic center correction on GPU[C]∥Electrical Engineering.IEEE,2015:20-25.
[12]BHIMANI J,LEESER M,MI N.Accelerating K-Means clustering with parallel implementations and GPU computing[C]∥High Performance Extreme Computing Conference.IEEE,2015:1-6.
[13]BAYDOUN M,DAWI M,GHAZIRI H.Enhanced parallel implementation of the K-Means clustering algorithm[C]∥International Conference on Advances in Computational TOOLS for Engineering Applications.IEEE,2016:7-11.
[14]ABBASITABAR H,SAMAVATIAN M H,SA-RBAZI-AZAD H.ASHA:An Adaptive Shared-Memory Sharing Architecture for Multi-Programmed GPUs[J].Microprocessors & Microsystems,2016,46:264-273.
[15]HETTICH S,BAY S D.KDD cup 1999 data[EB/OL].http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[16]WANG Q,MEGALOOIKONOMOU V.A clustering algorithm for intrusion detection[J].Proc Spie,2008,5812(5812):31-38.
[17]FANG W,LAU K K,LUM,et al.Parallel datamining on gra-phics processors[OL].http://11130.126.143.33/sites/default/files/papers/358/gpuminer.pdf.
[1] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[2] CHEN Jing, WU Ling-ling. Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment [J]. Computer Science, 2022, 49(8): 108-112.
[3] WANG Jin, LIU Jiang. GPU-based Parallel DILU Preconditioning Technique [J]. Computer Science, 2022, 49(6): 108-118.
[4] SUN Xuan, WANG Huan-xiao. Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives [J]. Computer Science, 2022, 49(4): 67-73.
[5] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
[6] WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei. Research on Big Data Governance for Science and Technology Forecast [J]. Computer Science, 2021, 48(9): 36-42.
[7] YU Yue-zhang, XIA Tian-yu, JING Yi-nan, HE Zhen-ying, WANG Xiao-yang. Smart Interactive Guide System for Big Data Analytics [J]. Computer Science, 2021, 48(9): 110-117.
[8] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
[9] WANG Xue-cen, ZHANG Yu, LIU Ying-jie, YU Ge. Evaluation of Quality of Interaction in Online Learning Based on Representation Learning [J]. Computer Science, 2021, 48(2): 207-211.
[10] TENG Jian, TENG Fei, LI Tian-rui. Travel Demand Forecasting Based on 3D Convolution and LSTM Encoder-Decoder [J]. Computer Science, 2021, 48(12): 195-203.
[11] ZHANG Yu-long, WANG Qiang, CHEN Ming-kang, SUN Jing-tao. Survey of Intelligent Rain Removal Algorithms for Cloud-IoT Systems [J]. Computer Science, 2021, 48(12): 231-242.
[12] WEN Min-hua, WANG Shen-peng, WEI Jian-wen, LI Lin-ying, ZHANG Bin, LIN Xin-hua. DGX-2 Based Optimization of Application for Turbulent Combustion [J]. Computer Science, 2021, 48(12): 43-48.
[13] LIU Ya-chen, HUANG Xue-ying. Research on Creep Feature Extraction and Early Warning Algorithm Based on Satellite MonitoringSpatial-Temporal Big Data [J]. Computer Science, 2021, 48(11A): 258-264.
[14] ZHANG Guang-jun, ZHANG Xiang. Mechanism and Path of Optimizing Institution of Legislative Evaluation by Applying “Big Data+Blockchain” [J]. Computer Science, 2021, 48(10): 324-333.
[15] YE Ya-zhen, LIU Guo-hua, ZHU Yang-yong. Two-step Authorization Pattern of Data Product Circulation [J]. Computer Science, 2021, 48(1): 119-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!