计算机科学 ›› 2012, Vol. 39 ›› Issue (3): 196-199.

• 人工智能 • 上一篇    下一篇

基于流形正则化的文档分类算法研究

徐海瑞,张文生,吴双   

  1. (中国科学院自动化研究所 北京100190)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Document Classification Algorithm Based on Manifold Regularization

XU Hai-rui,ZHANG Wen-sheng,WU Shuang   

  • Online:2018-11-16 Published:2018-11-16

摘要: 基于流形正则化框架提出一种分类算法(MI_I}RI_SC),以解决高维文档分类问题。该算法通过构建训练样 本的最近部图来佑计数据空间的几何结构并将其作为流形正则化项,结合多变量线性回归获得高维文档的低维流形 结构,并采用k近部分类器对低维流形进行分类,得到针对多类问题的分类器。该算法能够充分利用训练样本的类别 信息来帮助学习以提取有效特征。通过在Rcutcrs 21578数据集上的实验,证明该算法的分类性能和运行速度比传统 分类器有较大的提高。

关键词: 局部鉴别嵌入,流形学习,文档分类,k近部,流形正则化

Abstract: A novel document classification algorithm based on manifold regularization framework, which is called MI_I} RLSC, is presented to resolve high dimensional document classification. In the proposed MLI}RLSC, a nearest neighbor graph was constructed and the intrinsic geometrical structure of the sample space was taken as a manifold regularization term,then it was incorporated into the objective function of the multivariate linear regression to extract lower dimen- sional space. The classification and predication in the lower dimensional feature space are implemented with kNN. Ai- ming to extract effective features for the multi-class problem, MLD-RLSC can make use of all labeled samples. Experi- mental results on Reuters 21578 dataset demonstrate that the proposed algorithm is of higher classification accuracy and faster running speed.

Key words: LDE, Manifold learning, Next categorization, kNN, Manifold regularization

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!