计算机科学 ›› 2011, Vol. 38 ›› Issue (1): 195-197.

• 数据库与数据挖掘 • 上一篇    下一篇

聚类模式下一种优化的K-means文本特征选择

刘海峰,刘守生,张学仁   

  1. (解放军理工大学理学院 南京210007)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(编号:70571087)资助。

Clustering-based Improved K-means Text Feature Selection

LIU Hai-feng,LIU Shou-sheng,ZHANG Xue-ren   

  • Online:2018-11-16 Published:2018-11-16

摘要: 文本特征降维是文本自动分类的核心技术。K-means方法是一种常用的基于划分的方法。针对该算法对类中心初始值及孤立点过于敏感的问题,提出了一种改进的K-means算法用于文本特征选择。通过优化初始类中心的选择模式及对孤立点的剔除,改善了文本特征聚类的效果。随后的文本分类试验表明,提出的改进K-means算法具有较好的特征选择能力,文本分类的效率较高。

关键词: 特征选择,聚类,K均值,文本分类

Abstract: Text feature reduction is the key technology in text categorization. In addition, K-means is an partitioning method which usually be used. With regards to this arithmetic excessively incentive to the initial centers and the isolated points, the improved K-means arithmetic was put forward which is used in text feature selection. Text feature clustering was improved by optimizing primitive class center's options and the elimination of isolated point Following text classification test shows that the K-means arithmetic put forward in this paper has a good feature selection ability and high efficiency in text categorization.

Key words: Feature selection, Clustering, K-means, Text categorization

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!