最近邻优化的k-means聚类算法

摘要/Abstract

摘要： 传统的k-means算法不论其数据样本的分布情况,将簇边缘位置、簇中心位置、离群点的数据样本全部按照最小距离原则,划分到离它最近的聚类中心所在簇中,没有考虑数据样本与其他簇之间的关系。如果数据样本与另一簇中心的距离接近于最小距离,则此数据样本与两个簇的关系都很大,显然这样直接划分并不合理。针对此问题,文中提出了最近邻优化的k-means聚类算法。运用近邻的思想,将这些不“很属于”某簇的数据样本划分到其最近邻数据样本所在的簇中,实验结果表明,这种最近邻优化的k-means聚类算法有效地减少了算法的迭代次数,提高了算法的聚类准确度,得到了良好的聚类效果。

关键词: k-means, 簇, 分布, 关系, 最近邻

Abstract: Traditional k-means algorithms usually ignores the distribution of the data samples,assign all of them in the cluster edge position,center position,outliers to the cluster which nearest clustering center locates,in accordance with the principle of minimum distance,without considering the relationsh1ip between the data sample and other clusters.If the distance between the data sample and the other cluster is close to the minimum distance,the data sample is very close to the two clusters,obviously,the direct division menthod is not reasonable.Aiming at this problem,this paper presented a clustering algorithm optimized nearest neighbor (1NN-kmeans).Using the ideas of neighbor,assign these samples that do not firmly belong to a certain cluster to the cluster that the nearest neighbor sample belongs to.The experimental results show that 1NN effectively reduced the number of iterations and improved the clustering accuracy and finally achieved the better clustering results.

Key words: Cluster, Distribution, K-means, Nearest neighbor, Relationship

中图分类号:

TP181

林涛, 赵璨. 最近邻优化的k-means聚类算法[J]. 计算机科学, 2019, 46(11A): 216-219. https://doi.org/

LIN Tao, ZHAO Can. Nearest Neighbor Optimization k-means Clustering Algorithm[J]. Computer Science, 2019, 46(11A): 216-219. https://doi.org/

参考文献

[1]高曼,韩勇,陈戈,等.基于K-means聚类算法的公交行程速度计算模型[J].计算机科学,2016,43(S1):422-424,439.
[2]赵建民,管国权,王红艳.基于遗传算法的硬聚类算法改进[J].计算机工程与科学,2008(8):83-85.
[3]唐胡鑫.电子商务客户忠诚度模型仿真研究[J].计算机仿真,2016,33(1):413-415,424.
[4]王勇,唐靖,饶勤菲,等.高效率的K-means最佳聚类数确定算法[J].计算机应用,2014,34(5):1331-1335.
[5]谢娟英,王艳娥.最小方差优化初始聚类中心的K-means算法[J].计算机工程,2014,40(8):205-211,223.
[6]郁启麟.K-means算法初始聚类中心选择的优化[J].计算机系统应用,2017,26(5):170-174.
[7]邢长征,谷浩.基于平均密度优化初始聚类中心的k-means算法[J].计算机工程与应用,2014,50(20):135-138.
[8]朴尚哲,超木日力格,于剑.模糊C均值算法的聚类有效性评价[J].模式识别与人工智能,2015,28(5):452-461.
[9]马闯,吴涛,段梦雅.基于K近邻隶属度的聚类算法研究[J].计算机工程与应用,2016,52(10):55-58,117.
[10]王超学,潘正茂,马春森,等.改进型加权KNN算法的不平衡数据集分类[J].计算机工程,2012,38(20):160-163,168.
[11]华辉有,陈启买,刘海,等.一种融合Kmeans和KNN的网络入侵检测算法[J].计算机科学,2016,43(3):158-162.
[12]苏毅娟,邓振云,程德波,等.大数据下的快速KNN分类算法[J].计算机应用研究,2016,33(4):1003-1006,1023.
[13]ARTHUR D,VASSILVITSKII S.k-means++:the advantages of careful seeding[C]∥Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms.Society for Industrial and Applied Mathematics Philadelphia,PA,USA,2007:1027-1035.
[14]余秀雅,刘东平,杨军.基于K-means++的无线传感网分簇算法研究[J].计算机应用研究,2017,34(1):181-185.
[15]ASUNCION A,NEWMAN D J.UCI machine learning repository[EB/OL].[2009-12-23].http://archive.ics.uci.edu／.

相关文章 15

[1]	鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[3]	单晓英, 任迎春. 基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别 Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm 计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216
[4]	李丹丹, 吴宇翔, 朱聪聪, 李仲康. 基于多种改进策略的改进麻雀搜索算法 Improved Sparrow Search Algorithm Based on A Variety of Improved Strategies 计算机科学, 2022, 49(6A): 217-222. https://doi.org/10.11896/jsjkx.210700032
[5]	鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[6]	王欣, 向明月, 李思颖, 赵若成. 基于隐马尔可夫模型的铁路出行团体关系预测研究 Relation Prediction for Railway Travelling Group Based on Hidden Markov Model 计算机科学, 2022, 49(6A): 247-255. https://doi.org/10.11896/jsjkx.210500001
[7]	黄少滨, 孙雪薇, 李熔盛. 基于跨句上下文信息的神经网络关系分类方法 Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network 计算机科学, 2022, 49(6A): 119-124. https://doi.org/10.11896/jsjkx.210600150
[8]	许思雨, 秦克云. 基于剩余格的模糊粗糙集的拓扑性质 Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices 计算机科学, 2022, 49(6A): 140-143. https://doi.org/10.11896/jsjkx.210200123
[9]	杨亚红, 王海瑞. 基于Renyi熵和BiGRU算法实现SDN环境下的DDoS攻击检测方法 DDoS Attack Detection Method in SDN Environment Based on Renyi Entropy and BiGRU Algorithm 计算机科学, 2022, 49(6A): 555-561. https://doi.org/10.11896/jsjkx.210800095
[10]	傅丽玉, 陆歌皓, 吴义明, 罗娅玲. 区块链技术的研究及其发展综述 Overview of Research and Development of Blockchain Technology 计算机科学, 2022, 49(6A): 447-461. https://doi.org/10.11896/jsjkx.210600214
[11]	梁懿雯, 杜育松. 抵御计时攻击的基于Knuth-Yao的二元离散高斯采样算法 Timing Attack Resilient Sampling Algorithms for Binary Gaussian Based on Knuth-Yao 计算机科学, 2022, 49(6A): 485-489. https://doi.org/10.11896/jsjkx.210600017
[12]	朱旭东, 熊贇. 基于样本分布损失的图像多标签分类研究 Study on Multi-label Image Classification Based on Sample Distribution Loss 计算机科学, 2022, 49(6): 210-216. https://doi.org/10.11896/jsjkx.210300267
[13]	赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[14]	孙浩, 毛瀚宇, 张岩峰, 于戈, 徐石成, 何光宇. 区块链跨链技术发展及应用 Development and Application of Blockchain Cross-chain Technology 计算机科学, 2022, 49(5): 287-295. https://doi.org/10.11896/jsjkx.210800132
[15]	陆亮, 孔芳. 面向对话的融入知识的实体关系抽取 Dialogue-based Entity Relation Extraction with Knowledge 计算机科学, 2022, 49(5): 200-205. https://doi.org/10.11896/jsjkx.210300198

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed