Computer Science ›› 2020, Vol. 47 ›› Issue (3): 110-115.doi: 10.11896/jsjkx.190700041

• Database & Big Data & Data Science • Previous Articles     Next Articles

Keywords Extraction Method Based on Semantic Feature Fusion

GAO Nan,LI Li-juan,Wei-william LEE,ZHU Jian-ming   

  1. (School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China)
  • Received:2019-06-04 Online:2020-03-15 Published:2020-03-30
  • About author:GAO Nan,born in 1983,Ph.D,is member of China Computer Federation.Her main research interests include data mining,machine learning and intelligent transportation system.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61702456) and Zhejiang Public Welfare Technology Research Program (2017C33108).

Abstract: Keyword extraction is widely used in the field of text mining,which is the prerequisite technology of text automatic summarization,classification and clustering.Therefore,it is very important to extract high quality keywords.At present,most researches on keyword extraction methods only consider some statistical features,but not the implicit semantic features of words,which leads to the low accuracy of extraction results and the lack of semantic information of keywords.To solve this problem,this paper designed a quantification method of the features between words and text themes.First,the word vector method is used to mine the context semantic relations of words.Then the main semantic features of the text is extracted by clustering.Finally,the distance between the words and the topic with the similar distance method is calculated.It is regarded as the semantic features of word.In addition,by combining the semantic features of word with the features of word frequency,length,location,language and other various description of words,a keywords extraction method of short text with semantic features was proposed,namely SFKE method.This method analyzes the importance of words from the statistical and semantic aspects,thus can extract the most relevant keyword set by integrating many factors.Experimental results show that the keyword extraction method integrating multiple features has significant improvement compared with TFIDF,TextRank,Yake,KEA,AE methods.The F-Score of this methodhas improved by 9.3% compared with AE.In addition,this paper used the method of information gain to evaluate the importance of features.The experimental results show that the F-Score of the model is increased by 7.2% after adding semantic feature.

Key words: Classification model, Semantic features, Statistical features, Support vector machine, Text mining

CLC Number: 

  • TP391
[1]ZHAO J S,ZHU Q M,ZHOU G D,et al.Review of Research in Automatic Keyword Extraction[J].Journal of Software,2017,28(9):2431-2449.
[2]BABAR S A,PATIL P D.Improving Performance of Text Summarization[J].Procedia Computer Science,2015,46:354-363.
[3]ONAN A,KORUKGLU S,BULUT H.Ensemble of Keyword Extraction Methods and Classifiers in Text Classification[J].Expert Systems with Applications,2016,57(C):232-247.
[4]LUHN H P.A Statistical Approach to Mechanized Encoding and Searching of Literary Information [J].IBM Journal of Research and Development 1957,1(4):309-317.
[5]MIHALCEA R,TARAU P.TextRank:Bringing Order into Texts[C]∥Proceeding Conference on Empirical Methods in Natural Language Processing.Barcelona,Spain:2004:404-411.
[6]CHEN W,WU Y Z,CHEN W L,et al.Automatic keyword extraction Based on BiLSTM-CRF[J].Computer Science,2018,45(S1):104-109.
[7]CAMPPOS R,MANGARAVITE V,PASQUALI A,et al.A Text Feature Based Automatic Keyword Extraction Method for Single Documents[C]∥Advances in Information Retrieval (EDS).Cham:Springer,2018:10772.
[8]ARDIANSYAH S,MAJID M A,ZAIN J M.Knowledge of extraction from trained neural network by using decision tree[C]∥International Conference on Science in Information Technology.IEEE,2017.
[9]FRANK E,PAYNTER G W,et al.Domain-Specic Keyphrase Extraction [C]∥International Joint Conference on Artificial Intelligence.1999:668-673.
[10]CHEN Y,YIN J,ZHU W,et al.Novel Word Features for Keyword Extraction [M]∥Web-Age Information Management.Springer International Publishing,2015:148-160.
[11]KANIS J.Digging Language Model-Maximum Entropy Phrase Extraction[C]∥International Conference on Text.Speech:Brno,Czech,2016:46-53.
[12]ZHOU C,LI S.Research of Information Extraction Algorithm based on Hidden Markov Model[C]∥International Conference on Information Science and Engineering.Springer,2010:1-4.
[13]ZHANG C.Automatic Keyword Extraction from Documents Using Conditional Random Fields[J].Journal of Computational Information Systems,2008,4(3):1169-1180.
[14]ZHANG Q,WANG Y,GONG Y,et al.Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter[C]∥Empirical Methods in Nnatural Language Processing.2016:836-845.
[15]AQUINO,GERMAN O,LANZARINI L C.Keyword Identification in Spanish Documents using Neural Networks[J].Journal of Computer Science & Technology,2015,15(2):55-60.
[16]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]∥International Conference on Learning Representations(ICLR).2013:1301-3781.
[17]LIU Z Y.Research on Keyword Extraction Method Based on Document Topic Structure[D].Beijing:Tsinghua University,2011.
[18]GitHub[OL].https://github.com/uk9921/StopWords.
[19]CHEN Y C,ZHANG Y X,WANG H,et al.Features Oriented Survey of State-of-the-Art Keyphrase Extraction Algorithms[J].Journal of Software,2018,29(7):2046-2070.
[20]LI S,ZHAO Z,HU R,et al.Analogical Reasoning on Chinese Morphological and Semantic Relations[J].Meeting of the Association for Computational Linguistics,2018,2:138-143.
[1] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[2] SHAN Xiao-ying, REN Ying-chun. Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm [J]. Computer Science, 2022, 49(6A): 211-216.
[3] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[4] XING Yun-bing, LONG Guang-yu, HU Chun-yu, HU Li-sha. Human Activity Recognition Method Based on Class Increment SVM [J]. Computer Science, 2022, 49(5): 78-83.
[5] DENG Wei-bin, ZHU Kun, LI Yun-bo, HU Feng. FMNN:Text Classification Model Fused with Multiple Neural Networks [J]. Computer Science, 2022, 49(3): 281-287.
[6] BAI Yong, ZHANG Zhan-long, XIONG Jun-di. Power Knowledge Text Mining Based on FP-Growth Algorithm and GRNN [J]. Computer Science, 2021, 48(8): 86-90.
[7] ZHANG Tong-ming, ZHANG Ning. Review of Research on Investor Sentiment Index in Stock Market [J]. Computer Science, 2021, 48(6A): 143-150.
[8] GUO Fu-min, ZHANG Hua, HU Rong-hua, SONG Yan. Study on Method for Estimating Wrist Muscle Force Based on Surface EMG Signals [J]. Computer Science, 2021, 48(6A): 317-320.
[9] ZHUO Ya-qian, OU Bo. Face Anti-spoofing Algorithm for Noisy Environment [J]. Computer Science, 2021, 48(6A): 443-447.
[10] LEI Jian-mei, ZENG Ling-qiu, MU Jie, CHEN Li-dong, WANG Cong, CHAI Yong. Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning [J]. Computer Science, 2021, 48(6): 190-195.
[11] WANG You-wei, ZHU Chen, ZHU Jian-ming, LI Yang, FENG Li-zhou, LIU Jiang-chun. User Interest Dictionary and LSTM Based Method for Personalized Emotion Classification [J]. Computer Science, 2021, 48(11A): 251-257.
[12] WANG Shi-hao, WANG Zhong-qing, LI Shou-shan, ZHOU Guo-dong. Event Argument Extraction Using Gated Graph Convolution and Dynamic Dependency Pooling [J]. Computer Science, 2021, 48(11A): 52-56.
[13] CAO Su-e, YANG Ze-min. Prediction of Wireless Network Traffic Based on Clustering Analysis and Optimized Support Vector Machine [J]. Computer Science, 2020, 47(8): 319-322.
[14] SONG Yan, HU Rong-hua, GUO Fu-min, YUAN Xin-liang and XIONG Rui-yang. Improved SVM+BP Algorithm for Muscle Force Prediction Based on sEMG [J]. Computer Science, 2020, 47(6A): 75-78.
[15] FANG Meng-lin, TANG Wen-bing, HUANG Hong-yun and DING Zuo-hua. Wall-following Navigation of Mobile Robot Based on Fuzzy-based Information Decomposition and Control Rules [J]. Computer Science, 2020, 47(6A): 79-83.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!