Computer Science ›› 2020, Vol. 47 ›› Issue (3): 110-115.doi: 10.11896/jsjkx.190700041

Keywords Extraction Method Based on Semantic Feature Fusion

GAO Nan,LI Li-juan,Wei-william LEE,ZHU Jian-ming   

  1. (School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China)
  • Received:2019-06-04 Online:2020-03-15 Published:2020-03-30
  • About author:GAO Nan,born in 1983,Ph.D,is member of China Computer Federation.Her main research interests include data mining,machine learning and intelligent transportation system.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61702456) and Zhejiang Public Welfare Technology Research Program (2017C33108).

Abstract: Keyword extraction is widely used in the field of text mining,which is the prerequisite technology of text automatic summarization,classification and clustering.Therefore,it is very important to extract high quality keywords.At present,most researches on keyword extraction methods only consider some statistical features,but not the implicit semantic features of words,which leads to the low accuracy of extraction results and the lack of semantic information of keywords.To solve this problem,this paper designed a quantification method of the features between words and text themes.First,the word vector method is used to mine the context semantic relations of words.Then the main semantic features of the text is extracted by clustering.Finally,the distance between the words and the topic with the similar distance method is calculated.It is regarded as the semantic features of word.In addition,by combining the semantic features of word with the features of word frequency,length,location,language and other various description of words,a keywords extraction method of short text with semantic features was proposed,namely SFKE method.This method analyzes the importance of words from the statistical and semantic aspects,thus can extract the most relevant keyword set by integrating many factors.Experimental results show that the keyword extraction method integrating multiple features has significant improvement compared with TFIDF,TextRank,Yake,KEA,AE methods.The F-Score of this methodhas improved by 9.3% compared with AE.In addition,this paper used the method of information gain to evaluate the importance of features.The experimental results show that the F-Score of the model is increased by 7.2% after adding semantic feature.

Key words: Text mining, Statistical features, Semantic features, Support vector machine, Classification model

CLC Number: 

  • TP391
