Computer Science ›› 2021, Vol. 48 ›› Issue (10): 44-50.doi: 10.11896/jsjkx.200900082

• Artificial Intelligence • Previous Articles     Next Articles

Graph Based Collaborative Extraction Method for Keywords and Summary from Documents

MAO Xiang-ke1,2,3, HUANG Shao-bin1, YU Qin-yong2,3   

  1. 1 College of Computer Science and Technology,Harbin Engineering University,Harbin 150001,China
    2 CETC Big Data Research Institute Co.,Ltd.,Guiyang 550022,China
    3 Big Data Application on Improving Governance Capabilities National Engineering Laboratory,Guiyang 550022,China
  • Received:2020-09-10 Revised:2021-03-10 Online:2021-10-15 Published:2021-10-18
  • About author:MAO Xiang-ke,born in 1992,Ph.D.His main research interests include natural language processing and machine lear-ning.
    HUANG Shao-bin,born in 1965,professor.His main research interests include data mining,natural language proces-sing and machine learning.
  • Supported by:
    Big Data Application on Improving Governance Capabilities National Engineering Laboratory Open Fund Project.

Abstract: The purpose of keywords extraction and summary extraction is to select key content from the original document to express the main meaning of the original document.The evaluation of keywords and summarization quality mainly depends on whether it can cover the main topics of the document.In the existing methods of keywords extraction and summary extraction based on graph models,it rarely involves the task of keywords extraction and summary extraction collaboratively.The article proposes a method based on a graph model for simultaneous keywords extraction and summary extraction.The method first uses the six relationships among words,topics,and sentences in the document,including words-words,topics-topics,sentences-sentences,words-topics,topics-sentences,words-sentences,to construct the graph;then uses the statistical characteristics of the words and sentences in the document to evaluate the prior importance of each vertex in the graph;next,it uses an iterative way to score words and sentences;finally,we get the final keywords and summary based on the scores of words and sentences.In order to verify the effectiveness of the proposed method,keywords extraction and summary extraction experiments are carried out on Chinese and English datasets.It is found that the proposed method achievs good results in both keywords extraction and summary extraction tasks.

Key words: Extractive summarization, Graph model, Keywords extraction, Topic cover

CLC Number: 

  • TP311.131
[1]CARBONELL J,GOLDSTEIN J.The use of MMR,diversity-based reranking for reordering documents and producing summaries[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1998:335-336.
[2]PAGE L,BRIN S,MOTWANI R,et al.The PageRank citation ranking:bringing order to the web[R].Stanford InfoLab,1999.
[3]MIHALCEA R,TARAU P.Textrank:bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411.
[4]ERKAN G,RADEV D R.Lexrank:graph-based lexical centrality as salience in text summarization[J].Journal of Artificial Intelligence Research,2004,22:457-479.
[5]WAN X,XIAO J.Exploiting neighborhood knowledge for single document summarization and keyphrase extraction[J].ACM Transactions on Information Systems (TOIS),2010,28(2):1-34.
[6]GOLLAPALLI S D,CARAGEA C.Extracting keyphrases from research papers using citation networks[C]//Twenty-Eighth AAAI Conference on Artificial Intelligence.2014.
[7]YU Y,NG V.Wikirank:improving keyphrase extraction based on background knowledge[J].arXiv:1803.09000,2018.
[8]WANG R,LIU W,MCDONALD C.Corpus-independent generic keyphrase extraction using word embedding vectors[C]//Software Engineering Research Conference.2014:1-8.
[9]WANG H,YE J,YU Z,et al.Unsupervised keyword extraction methods based on a word graph network[J].International Journal of Ambient Computing and Intelligence (IJACI),2020,11(2):68-79.
[10]LIU Z,HUANG W,ZHENG Y,et al.Automatic keyphrase extraction via topic decomposition[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Proces-sing.2010:366-376.
[11]FLORESCU C,CARAGEA C.Positionrank:an unsupervisedapproach to keyphrase extraction from scholarly documents[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).2017:1105-1115.
[12]TENEVA N,CHENG W.Salience rank:efficient keyphrase extraction with topic modeling[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers).2017:530-535.
[13]BISWAS S K,BORDOLOI M,SHREYA J.A graph based keyword extraction model using collective node weight[J].Expert Systems with Applications,2018,97:51-59.
[14]BOUGOUIN A,BOUDIN F,DAILLE B.Topicrank:graph-based topic ranking for keyphrase extraction[C]//International Joint Conference on Natural Language Processing (IJCNLP).2013:543-551.
[15]AL-KHASSAWNEH Y A,SALIM N,JARRAH M.Improving triangle-graph based text summarization using hybrid similarity function[J].Indian Journal of Science and Technology,2017,10(8):1-15.
[16]GOYAL P,BEHERA L,MCGINNITY T M.A context-basedword indexing model for document summarization[J].IEEE Transactions on Knowledge and Data Engineering,2012,25(8):1693-1705.
[17]RAMESH A,SRINIVASA K G,PRAMOD N.SentenceRank-A graph based approach to summarize text[C]//The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).IEEE,2014:177-182.
[18]SANKARASUBRAMANIAM Y,RAMANATHAN K,GHO-SH S.Text summarization using Wikipedia[J].Information Processing & Management,2014,50(3):443-461.
[19]CHENGZHANG X,DAN L.Chinese text summarization algorithm based on word2vec[J].Journal of Physics:Conference Series,2018,976(1):012006.
[20]ROUANE O,BELHADEF H,BOUAKKAZ M.Word Embedding-Based Biomedical Text Summarization[C]//International Conference of Reliable Information and Communication Technology.Cham:Springer,2019:288-297.
[21]YANG K,AL-SABAHI K,XIANG Y,et al.An integratedgraph model for document summarization[J].Information,2018,9(9):232.
[22]ERKAN G.Using biased random walks for focused summarization[C]//Proceedings of the 2006 Document Understanding Conference held at the Human Language Technology Confe-rence of the North American Chapter of the Association for Computational Linguistics.2006.
[23]OTTERBACHER J,ERKAN G,RADEV D.Using randomwalks for question-focused sentence retrieval[C]//Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.2005:915-922.
[24]MAO X,YANG H,HUANG S,et al.Extractive summarization using supervised and unsupervised learning[J].Expert Systems with Applications,2019,133:173-181.
[25]WAN X,YANG J,XIAO J.Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics.2007:552-559.
[26]FANG C,MU D,DENG Z,et al.Word-sentence co-ranking for automatic extractive text summarization[J].Expert Systems with Applications,2017,72:189-195.
[27]MAO X,HUANG S,LI R,et al.Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words[J].IEEE Access,2020,8:117528-117538.
[28]REIMERS N,GUREVYCH I.Sentence-bert:Sentence embed-dings using siamese bert-networks[J].arXiv:1908.10084,2019.
[29]LIN C Y.Rouge:a package for automatic evaluation of summaries[C]//Text Summarization Branches Out.2004:74-81.
[1] LIANG Jing-ru, E Hai-hong, Song Mei-na. Method of Domain Knowledge Graph Construction Based on Property Graph Model [J]. Computer Science, 2022, 49(2): 174-181.
[2] ZHANG Ying, ZHANG Yi-fei, WANG Zhong-qing and WANG Hong-ling. Automatic Summarization Method Based on Primary and Secondary Relation Feature [J]. Computer Science, 2020, 47(6A): 6-11.
[3] WEI De-bin,YANG Peng,YANG Li,SHI Huai-feng. Virtual Network Function Fast Mapping Algorithm over Satellite Network [J]. Computer Science, 2020, 47(3): 248-254.
[4] WANG Yang, CAI Shu-qin, ZOU Xin-wen, CHEN Zi-tong. Quality-embedded Hypergraph Model for Big Data Product Manufacturing System and Decision for Production Lines [J]. Computer Science, 2019, 46(2): 11-17.
[5] WANG Kai-xiang. Survey of Query-oriented Automatic Summarization Technology [J]. Computer Science, 2018, 45(11A): 12-16.
[6] XUE Zhan-ao, WANG Peng-han, LIU Jie, ZHU Tai-long and XUE Tian-yu. Three-way Decision Model Based on Probabilistic Graph [J]. Computer Science, 2016, 43(1): 30-34.
[7] WANG Jun-li, WEI Shao-chen and GUAN Min. Survey on Graph Model-based Document Summarization [J]. Computer Science, 2015, 42(12): 1-7.
[8] WANG Li, QIN Xiao-lin and XU Jian-qiu. Probabilistic Threshold Reverse Nearest Neighbor Queries for Indoor Moving Objects [J]. Computer Science, 2015, 42(1): 201-205.
[9] HUANG Lei,WU Yan-peng and ZHU Qun-feng. Research and Improvement of TFIDF Text Feature Weighting Method [J]. Computer Science, 2014, 41(6): 204-207.
[10] LI Wei,MA Yong-zheng and SHEN Yi. Labeled-LDA Text Classification Algorithm Based on Graph Model for “Central Topic Oblivion Problem” [J]. Computer Science, 2014, 41(3): 223-227.
[11] DU Lin-lin,ZHU Zhen-feng,DUAN Hong-shuai and ZHAO Yao. Local Structure Preserved Shared-subspace Analysis [J]. Computer Science, 2014, 41(10): 67-71.
[12] WANG Hao-ran,BAI Lian, LAO Song-yang. Approach Based on Graphics Model for Semantic Modeling in Soccer Video [J]. Computer Science, 2011, 38(6): 266-269.
[13] LIAO Wei , WU Xiao-ping, YAN Cheng-hua, ZHONG Zhi-nong. Novel Method for Continuous Queries Processing in Road Networks [J]. Computer Science, 2009, 36(9): 151-153.
[14] LIU Wei,CHEN Xin-wu,TIAN Jin-wen. Object Semantic Probabilistic Model and its Application in Category Object Recognition and Scene Analysis [J]. Computer Science, 2009, 36(7): 273-277.
[15] . [J]. Computer Science, 2009, 36(6): 44-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!