计算机科学 ›› 2010, Vol. 37 ›› Issue (10): 165-168.

• 数据库与数据挖掘 • 上一篇    下一篇

分层特征计算和错误控制的层次分类方法

吴碧军,李涓子,金鑫   

  1. (清华大学计算机系 北京100084)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家973项目(No. 2007CB310803)资助。

Hierarchical Classification Approach of Hierarchical Feature Selection and Error Control

WU Bi-jun,LI Juan-zi,JIN Xin   

  • Online:2018-12-01 Published:2018-12-01

摘要: 中文新闻信息分类标准中,类别数量大。在将其应用于新闻分类时,会出现训练模型大、训练时间长,尤其是当部分类别改变时需要全部重新训练等问题。由于分类标准中类别之间存在层次关系,因此层次分类方法可以作为解决方案。研究层次化的中文新闻分类方法,并从以下两方面改善层次化分类方法的效果:1)分层的新闻特征计算,解决了层次分类中新闻在分类类别下的特征向量的不同表示的问题;2)错误控制,解决了在上一层分类错误的情况下新闻不会分到正确的类别上的情况。实验结果表明,层次分类方法的效果比平面分类的准确度提高了约4%,进行多次特征权重计算的层次分类方法比普通的层次分类的准确度提高了约3%,同时进行错误控制的分类效果比普通层次的分类效果提高了大概3%。

关键词: 层次分类,支持向量机,中文信息分类标准,特征计算,错误控制

Abstract: There arc thousands of subjects in Chinese news subject specification. When they arc used in news classification,long training time and large model are two key problems we are facing, especially when some of classes are changed. Chinese news subject classification has hierarchical structure and hierarchical can solve the problem partially.We improved the Chinese news hierarchical classification to get better the result from two points of view. 1) Repetitious feature calculation represents news of different layers in hierarchical classification. 2) Use error control to solve the problem that one error classification in upper layer will lead in the error classification of its deeper classes. Our experimenu shows that hierarchical classification improves the precision of 4% comparing with flat classification, hierarchical classification with Repetitious feature calculation improves 3% comparing with hierarchical classification, and hierarchical classification with error control improves 3 % comparing with hierarchical classification.

Key words: Hierarchical classification,Support vector machine,Chinese news subject classification specification,Fcature calculation,Error control

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!