基于自适应中文分词和近似SVM的文本分类算法

Computer Science ›› 2010, Vol. 37 ›› Issue (1): 251-254.

Text Classification Algorithm Based on Adaptive Chinese Word Segmentation and Proximal SVM

FENG Yong,LI Hua,ZHONG Jiang,YE Chun-xiao

Online:2018-12-01 Published:2018-12-01

Abstract

Abstract: New words recognition and ambiguity resolving are key problems in Chinese word segmentation. The result of traditional dictionary-based matching algorithm largely depends on the representative of the dictionary so that it can not recognize new words effectively, especially in some professional domains. Chinese word segmentation method in this dissertation is based on 2-gram statistical model and can meet the rectuirements of application in accuracy and efficiency respectively. PSVM takes classification as a linear equality quadratic programming problem. This dissertation describes a text classification algorithm based on adaptive Chinese word segmentation and PSVM, which has faster training speed and smaller memory requirements advantages. Several data sets of experiments showed that the classification algorithm can automatically adapt to knowledge management of some professional domains and has better classfication performance under the condition of timcsensitive.

Key words: Adaptive Chinese word segmentation, Proximal support vector machines, Text classification, Knowledge management

FENG Yong,LI Hua,ZHONG Jiang,YE Chun-xiao. Text Classification Algorithm Based on Adaptive Chinese Word Segmentation and Proximal SVM[J].Computer Science, 2010, 37(1): 251-254.