Computer Science ›› 2019, Vol. 46 ›› Issue (6A): 142-145.

• Intelligent Computing • Previous Articles     Next Articles

Text Keyword Extraction Method Based on Weighted TextRank

XU Li   

  1. School of Software,Shangqiu Polytechnic,Shangqiu,Henan 476100,China;
    Suzhou Research Institute,University of Science and Technology of China,Suzhou,Jiangsu 215000,China
  • Online:2019-06-14 Published:2019-07-02

Abstract: To improve the accuracy of keyword extraction,a text keyword extraction me-thod was proposed.This methodcombines the influence factors such as word frequency,word length,word position and word length,proposes the weight formula of candidate keywords.Then it obtains the relative optimal weight coefficient in the weight formula by experiment,applies the weight formula to the candidate keyword scoring formula of TextRank algorithm,and extracts the accuracy of text keywords.The accuracy,recall and F value of OPW-TextRank algorithm and TextRank algorithm in single text keyword extraction were compared through the experiment.The results show that the accuracy of OPW-TextRank algorithm is higher than that of TextRank algorithm when the window size is 6.It is useful in natural language processing keyword system based on text keyword extraction.

Key words: Keyword extraction, TextRank, Weighting, Word frequency

CLC Number: 

  • TP391.1
[1]张璐,芦天亮,杜彦辉.基于WMF_LDA主题模型的文本相似度计算[J/OL].计算机应用研究,2019(10):1-8.
[2]HASSAINE A,MECHETER S,JAOUA A.Text Categorization Using Hyper Rectangular Keyword Extraction:Application to News Articles Classification[C]∥International Conference on Relational and Algebraic Methods in Computer Science.Springer International Publishing,2015:312-325.
[3]曲靖野,陈震,胡轶楠.共词分析与LDA模型分析在文本主题挖掘中的比较研究[J].情报科学,2018,36(2):18-23.
[4]ZHANG W N,MING Z Y,ZHANG Y,et al.Exploring Key Concept Paraphrasing Based on Pivot Language Translation for Question Retrieval[C]∥Design Automation and Test in Europe.2015:1-4.
[5]夏火松,甄化春.大数据环境下舆情分析与决策支持研究文献综述[J].情报杂志,2015,34(2):1-6,21.
[6]SALTON G,BUCKLEY C.Term-weighting approaches in automatic text retrieval[J].Information Processing & Management,1987,24(5):513-523.
[7]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
[8]MIHALCEA R,TARAU P.TextRank:Bringing Order into Texts[J].Emnlp,2004:404-411.
[9]李鹏,王斌,石志伟,等.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351.
[10]ORTEGA F J,VALLEJO C G.STR:A GRAPH-BASED TAGGING TECHNIQUE[J].International Journal on Artificial Intelligence Tools,2011,20(5):955-967.
[11]夏天.词语位置加权TextRank的关键词抽取研究[J].现代图书情报技术,2013(9):30-34.
[12]顾益军,夏天.融合LDA与TextRank的关键词抽取研究[J].现代图书情报技术,2014(Z1):41-47.
[13]杨玥,张德生.中文文本的主题关键短语提取技术[J].计算机科学,2017,44(S2):432-436.
[14]张建娥.基于多特征融合的中文文本关键词提取方法[J].情报理论与实践,2013,36(10):105-108.
[15]SCHMIDHUBER J.Deep learning in neural networks:An overview[J].Neural Networks,2015,61:85-117.
[16]CSOMAI A.Keywords in the mist:Automated keyword extraction for very large documents and back of the book indexing[J].Unt Theses & Dissertations,2008.
[17]DOSTÀL M,JEZEK K.Automatic Keyphrase Extraction based on NLP and Statistical Methods[C]∥Dateso 2011 International Workshop on Databases,Texts,Specifications and Objects.Pisek,Czech Republic,DBLP,2011:140-145.
[18]TIMONEN M,TOIVANEN T,TENG Y,et al.Informative-ness-based Keyword Extraction from Short Documents[C]∥KDIR.2012:411-421.
[1] ZHU Zhen, HUANG Rui, ZANG Tie-gang, LU Shi-jun. Single Image Defogging Method Based on Weighted Near-InFrared Image Fusion [J]. Computer Science, 2020, 47(8): 241-244.
[2] GU Xue-mei,LIU Jia-yong,CHENG Peng-sen,HE Xiang. Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model [J]. Computer Science, 2020, 47(2): 245-250.
[3] CHEN Qing-chao, WANG Tao, YIN Shi-zhuang, FENG Wen-bo. Chain Merging Method for Unknown Text Protocol Candidate Keyword Stored in Multi-levelDictionary [J]. Computer Science, 2020, 47(12): 332-335.
[4] ZHANG Liang-cheng, WANG Yun-feng. Dynamic Adaptive Multi-radar Tracks Weighted Fusion Method [J]. Computer Science, 2020, 47(11A): 321-326.
[5] DUAN Jian-yong, YOU Shi-xin, ZHANG Mei, WANG Hao. Keyword Extraction Based on Multi-feature Fusion [J]. Computer Science, 2020, 47(11A): 73-77.
[6] CAO Yi-qin, CAO Ting, HUANG Xiao-sheng. Image Fusion Method Based on àtrous-NSCT Transform and Region Characteristic [J]. Computer Science, 2019, 46(6): 270-276.
[7] GUO Wei, YU Jian-jiang, TANG Ke-ming, XU Tao. Survey of Online Sequential Extreme Learning Algorithms for Dynamic Data Stream Analysis [J]. Computer Science, 2019, 46(4): 1-7.
[8] LV Jia-gao,LIANG Kui-yang,CAI Wei. Frontier Scientific Keyword Extraction Based on Bibliometric and Crowdsourcing [J]. Computer Science, 2019, 46(3): 275-282.
[9] WANG Zi-jie, ZHOU Ya-jing, LI Hui-jia. Dynamical Network Clustering Algorithm Based on Weighting Strategy [J]. Computer Science, 2019, 46(11A): 167-171.
[10] YANG Liu, CHEN Li-min, YI Yu-gen. Face Recognition Method Based on Adaptively Weighted Sub-pattern Discriminant Neighborhood Projection [J]. Computer Science, 2019, 46(10): 307-310.
[11] CHEN Wei, WU You-zheng, CHEN Wen-liang, ZHANG Min. Automatic Keyword Extraction Based on BiLSTM-CRF [J]. Computer Science, 2018, 45(6A): 91-96.
[12] PENG Xiao-bing, ZHU Yu-quan. Weighted Support Vector Machine Algorithm Based on Inner-correlations and Mutual Information of Features [J]. Computer Science, 2018, 45(12): 182-186.
[13] CAI Liu-ping, XIE Hui, ZHANG Fu-quan, ZHANG Long-fei. Study on Big Data Mining Method Based on Sparse Representation and Feature Weighting [J]. Computer Science, 2018, 45(11): 256-260.
[14] ZHAO Yi-lin, JIANG Lin, MI Yun-long, LI Jin-hai. Dynamic Parallel Updating Algorithm for Approximate Sets of Graded Multi-granulation Rough Set Based on Weighting Granulations and Dominance Relation [J]. Computer Science, 2018, 45(10): 11-20.
[15] DONG Yuan and QIAN Li-ping. Text Similarity Calculation Based on Semantic Dictionary and Word Frequency Information [J]. Computer Science, 2017, 44(Z11): 422-427.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!