计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 162-170.doi: 10.11896/jsjkx.240700017

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于改进SOM网络的聚类算法

蒋锐, 范姝文, 王小明, 徐友云   

  1. 南京邮电大学通信与信息工程学院 南京 210003
  • 收稿日期:2024-07-05 修回日期:2024-10-26 出版日期:2025-08-15 发布日期:2025-08-08
  • 通讯作者: 蒋锐(j_ray@njupt.edu.cn)
  • 基金资助:
    国家自然科学基金(62371246)

Clustering Algorithm Based on Improved SOM Model

JIANG Rui, FAN Shuwen, WANG Xiaoming, XU Youyun   

  1. School of Communications and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Received:2024-07-05 Revised:2024-10-26 Online:2025-08-15 Published:2025-08-08
  • About author:JIANG Rui,born in 1985,Ph.D,asso-ciate professor.His main research in-terests include artificial intelligence and wireless communication.
  • Supported by:
    National Natural Science Foundation of China(62371246).

摘要: 在自组织映射(Self-organizing Map,SOM)模型的训练过程中,不同类数据对权重矩阵的更新有不同作用,某一类数据对权重矩阵的更新会对其他类获胜神经元特征向量产生偏离其数据特征的影响,从而降低算法聚类精度。针对以上问题,提出一种改进的基于置信度SOM模型(Improved Confidence-based SOM Model,icSOM)。样本数据首先由K-means算法初步分类,为模型训练提供更多的数据信息;然后将预分类后的数据分别训练相互独立的SOM模型,以消除不同类之间的影响;最后在传统SOM模型基础上提出置信度矩阵概念,通过综合判断获胜神经元的置信度及其与输入数据间的欧氏距离最终得到置信神经元,根据置信神经元所属类别给数据分配聚类标签。在鸢尾花数据集(Iris)及葡萄酒数据集(Wine)上利用icSOM进行聚类分析,实验结果表明,所提算法可以更好地处理样本数据,取得了较好的聚类效果。

关键词: 机器学习, 无监督学习, 聚类, 自组织特征映射神经网络

Abstract: In the training process of the Self-Organizing Map,different classes of data have varying effects on the update of the weight matrix.Therefore,the update of the weight matrix for a certain class of data will have an impact on the feature vectors of the winning neurons,which are corresponding to other classes of data.This impact causes the winning neurons to deviate from the features of the data,thus reducing the clustering accuracy of the algorithm.Regarding the above issue,this paper proposes an improved confidence-based SOM model(icSOM).Firstly,the sample data is classified by the K-means algorithm to provide more information for model training.Secondly,the pre-classified data is used for training different classes SOM models to eliminate the influence caused by data from different classes.Based on the traditional SOM model,the concept of confidence matrix is then proposed.By comprehensively evaluating the confidence of the winning neurons and their Euclidean distance to the input data,the confident neuron is finally obtained.The clustering label that assigned to this input data is same as this confident neuron's class.Using icSOM for clustering analysis of the Iris dataset and the Wine dataset,the experimental results show that the proposed algorithm can handle sample data more effectively and achieve better clustering performance.

Key words: Machine learning, Unsupervised learning, Clustering, Self-organizing feature map neural network

中图分类号: 

  • TP181
[1]SCHANK R C.What is AI,anyway? [J].AI Magazine,1987,8(4):59.
[2]MCCARTHY J.Generality in artificial intelligence [J].Communications of the ACM,1987,30(12):1030-1035.
[3]SAMUEL A L.Some studies in machine learning using the game of checkers [J].IBM Journal of Research and Development,1959,3(3):210-229.
[4]SAMUEL A L.Machine learning [J].The Technology Review,1959,62(1):42-45.
[5]BAŞTANLAR Y,ÖZUYSAL M.Introduction to machine lear-ning [J].miRNomics:MicroRNA Biology and Computational Analysis,2014:105-128.
[6]EL NAQA I,MURPHY M J.What is machine learning? [M].Springer International Publishing,2015.
[7]ZHOU Z H.Machine Learning[M].Beijing:Tsinghua University Press,2016.
[8]NASTESKI V.An overview of the supervised machine learning methods [J].Horizons.B,2017,4:51-62.
[9]DAYAN P,SAHANI M,DEBACK G.Unsupervised learning[J].The MIT Encyclopedia of the Cognitive Sciences,1999:857-859.
[10]BZDOK D,KRZYWINSKI M,ALTMAN N.Machine learning:supervised methods [J].Nature Methods,2018,15(1):5-6.
[11]LAAKSONEN J,OJA E.Classification with learning k-nearest neighbors [C]//Proceedings of International Conference on Neural Networks.IEEE,1996:1480-1483.
[12]LIU C X,SHI D M,SONG W J.Research thread and latest progress of the methods of dimensionality reduction in high-dimensional data[J].Journal of Statistics,2023,4(3):11-21.
[13]WOLD S,ESBENSEN K,GELADI P.Principal component ana-lysis [J].Chemometrics and Intelligent Laboratory Systems,1987,2(1/2/3):37-52.
[14]BAI Y X.The application of k-means in feature selection[J].Electronic Technology & Software Engineering,2018,123(1):186-187.
[15]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise [J].KDD,1996,96(34):226-231.
[16]KOHONEN T.The self-organizing map [J].Proceedings of the IEEE,1990,78(9):1464-1480.
[17]KOHONEN T.Things you haven't heard about the self-organizing map [C]//IEEE International Conference on Neural Networks.IEEE,1993:1147-1156.
[18]KOHONEN T.Exploration of very large databases by self-organizing maps [C]//Proceedings of International Conference on Neural Networks.IEEE,1997,1:1-6.
[19]KOHONEN T.Essentials of the self-organizing map [J].Neural Networks,2013,37:52-65.
[20]ZHOU G,YANG F,XIAO J.Study on pixel entanglement theoryfor imagery classification [J].IEEE Transactions on Geoscience and Remote Sensing,2022,60:1-18.
[21]LI S,LIU F,JIAO L,et al.Self-supervised self-organizing clustering network:a novel unsupervised representation learning method [J].IEEE Transactions on Neural Networks and Lear-ning Systems,2022,35:1857-1871.
[22]YAN J,ZHANG C,LI Y.A clustering method for power time series curves based on improved self-organizing map algorithm [C]//2023 IEEE 3rd International Conference on Electronic Technology,Communication and Information(ICETCI).IEEE,2023:451-455.
[23]XIE D,FAN L,FU C,et al.Nonintrusive load monitoring algorithm using SOM-AdaDBSCAN [C]//2023 6th International Conference on Energy,Electrical and Power Engineering(CEEPE).IEEE,2023:905-910.
[24]KHAN S,MAILEWA A B.Discover botnets in IoT sensor networks:A lightweight deep learning framework with hybrid self-organizing maps [J].Microprocessors and Microsystems,2023,97:104753.
[25]BENDJAMA H,BOUHOUCHE S,AOUABDI S,et al.Monitoring of casting quality using principal component analysis and self-organizing map[J].The International Journal of Advanced Manufacturing Technology,2022,120(5):3599-3607.
[26]FORT J C,PAGÉS G.About the Kohonen algorithm:strong or weak self-organization? [J].Neural Networks,1996,9(5):773-785.
[27]ANDERSON E.The irises of the gaspe peninsula [J].Bulletin of American Iris Society,1935,59:2-5.
[28]AEBERHARD S,COOMANS D,VEL O D.Comparative analysis of statistical pattern recognition methods in high dimensional settings [J].Pattern Recognition,1994,27(8):1065-1077.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!