聚类模式下一种优化的K-means文本特征选择

计算机科学 ›› 2011, Vol. 38 ›› Issue (1): 195-197.

聚类模式下一种优化的K-means文本特征选择

刘海峰,刘守生,张学仁

(解放军理工大学理学院南京210007)

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然科学基金项目(编号:70571087)资助。

Clustering-based Improved K-means Text Feature Selection

LIU Hai-feng,LIU Shou-sheng,ZHANG Xue-ren

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 文本特征降维是文本自动分类的核心技术。K-means方法是一种常用的基于划分的方法。针对该算法对类中心初始值及孤立点过于敏感的问题，提出了一种改进的K-means算法用于文本特征选择。通过优化初始类中心的选择模式及对孤立点的剔除，改善了文本特征聚类的效果。随后的文本分类试验表明，提出的改进K-means算法具有较好的特征选择能力，文本分类的效率较高。

关键词: 特征选择，聚类，K均值，文本分类

Abstract: Text feature reduction is the key technology in text categorization. In addition, K-means is an partitioning method which usually be used. With regards to this arithmetic excessively incentive to the initial centers and the isolated points, the improved K-means arithmetic was put forward which is used in text feature selection. Text feature clustering was improved by optimizing primitive class center's options and the elimination of isolated point Following text classification test shows that the K-means arithmetic put forward in this paper has a good feature selection ability and high efficiency in text categorization.

Key words: Feature selection, Clustering, K-means, Text categorization

刘海峰,刘守生,张学仁. 聚类模式下一种优化的K-means文本特征选择[J]. 计算机科学, 2011, 38(1): 195-197. https://doi.org/

LIU Hai-feng,LIU Shou-sheng,ZHANG Xue-ren. Clustering-based Improved K-means Text Feature Selection[J]. Computer Science, 2011, 38(1): 195-197. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed