计算机科学 ›› 2012, Vol. 39 ›› Issue (10): 182-186.

• 数据库与数据挖掘 • 上一篇    下一篇

基于区域划分的kNN文本快速分类算法研究

胡 元,石 冰   

  1. (山东大学计算机科学与技术学院 济南250101);(中国人民解放军77675部队 林芝860000)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Fast kNN Text Classification Algorithm Based on Area Division

  • Online:2018-11-16 Published:2018-11-16

摘要: kNN方法作为一种简单、有效、非参数的分类方法,在文本分类中广泛应用。为提高其分类效率,提出一种基于区域划分的kNN文本快速分类算法。将训练样本集按空间分布情况划分成若干区域,根据测试样本与各区域之间的位置关系快速查找其k个最近部,从而大大降低kNN算法的计算量。数学推理和实验结果均表明,该算法在确保kNN分类器准确率不变的前提下,显著提高了分类效率。

关键词: 文本分类,kNN算法,聚类,怜均值算法

Abstract: As a simple, effective and non-parametric classification algorithm, kNN method has been widely used in text classification. In order to improve the efficiency of classification,We proposed a fast kNN text classification algorithm based on area division. We divided the training set into several parts based on their area distribution, and then according to the relative positions between test patterns and those parts, easily found out k nearest neighbours of the test patterns in the training set. hhis will sharply cut down the amount of calculation of kNN algorithm Mathematical reasoning and the experimental results both show that this algorithm significantly improves the efficiency of classification while keeping the same accuracy rate of kNN classifier algorithm.

Key words: Text classification,K-nearest neighbor algorithm,Clustering,K-means algorithm

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!