计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 44-50.doi: 10.11896/jsjkx.200900082
毛湘科1,2,3, 黄少滨1, 余秦勇2,3
MAO Xiang-ke1,2,3, HUANG Shao-bin1, YU Qin-yong2,3
摘要: 关键词提取和摘要抽取的目的都是从原文档中选择关键内容并对原文档的主要意思进行概括。评价关键词和摘要抽取质量的好坏主要看其能否对文档的主题进行良好的覆盖。在现有基于图模型的关键词提取和摘要抽取方法中,很少涉及到将关键词提取和摘要抽取任务协同进行的,而文中提出了一种基于图模型的方法进行关键词提取和摘要的协同抽取。该方法首先利用文档中词、主题和句子之间的6种关系,包括词和词、主题和主题、句子和句子、词和主题、主题和句子、词和句子,进行图的构建;然后利用文档中词和句子的统计特征对图中各顶点的先验重要性进行评价;接着采用迭代的方式对词和句子进行打分;最后根据词和句子的得分,得到关键词和摘要。为验证所提方法的效果,文中在中英文数据集上进行关键词提取和摘要抽取实验,发现该方法不管是在关键词提取还是摘要抽取任务上都取得了良好的效果。
中图分类号:
[1]CARBONELL J,GOLDSTEIN J.The use of MMR,diversity-based reranking for reordering documents and producing summaries[C]//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1998:335-336. [2]PAGE L,BRIN S,MOTWANI R,et al.The PageRank citation ranking:bringing order to the web[R].Stanford InfoLab,1999. [3]MIHALCEA R,TARAU P.Textrank:bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411. [4]ERKAN G,RADEV D R.Lexrank:graph-based lexical centrality as salience in text summarization[J].Journal of Artificial Intelligence Research,2004,22:457-479. [5]WAN X,XIAO J.Exploiting neighborhood knowledge for single document summarization and keyphrase extraction[J].ACM Transactions on Information Systems (TOIS),2010,28(2):1-34. [6]GOLLAPALLI S D,CARAGEA C.Extracting keyphrases from research papers using citation networks[C]//Twenty-Eighth AAAI Conference on Artificial Intelligence.2014. [7]YU Y,NG V.Wikirank:improving keyphrase extraction based on background knowledge[J].arXiv:1803.09000,2018. [8]WANG R,LIU W,MCDONALD C.Corpus-independent generic keyphrase extraction using word embedding vectors[C]//Software Engineering Research Conference.2014:1-8. [9]WANG H,YE J,YU Z,et al.Unsupervised keyword extraction methods based on a word graph network[J].International Journal of Ambient Computing and Intelligence (IJACI),2020,11(2):68-79. [10]LIU Z,HUANG W,ZHENG Y,et al.Automatic keyphrase extraction via topic decomposition[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Proces-sing.2010:366-376. [11]FLORESCU C,CARAGEA C.Positionrank:an unsupervisedapproach to keyphrase extraction from scholarly documents[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).2017:1105-1115. [12]TENEVA N,CHENG W.Salience rank:efficient keyphrase extraction with topic modeling[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers).2017:530-535. [13]BISWAS S K,BORDOLOI M,SHREYA J.A graph based keyword extraction model using collective node weight[J].Expert Systems with Applications,2018,97:51-59. [14]BOUGOUIN A,BOUDIN F,DAILLE B.Topicrank:graph-based topic ranking for keyphrase extraction[C]//International Joint Conference on Natural Language Processing (IJCNLP).2013:543-551. [15]AL-KHASSAWNEH Y A,SALIM N,JARRAH M.Improving triangle-graph based text summarization using hybrid similarity function[J].Indian Journal of Science and Technology,2017,10(8):1-15. [16]GOYAL P,BEHERA L,MCGINNITY T M.A context-basedword indexing model for document summarization[J].IEEE Transactions on Knowledge and Data Engineering,2012,25(8):1693-1705. [17]RAMESH A,SRINIVASA K G,PRAMOD N.SentenceRank-A graph based approach to summarize text[C]//The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).IEEE,2014:177-182. [18]SANKARASUBRAMANIAM Y,RAMANATHAN K,GHO-SH S.Text summarization using Wikipedia[J].Information Processing & Management,2014,50(3):443-461. [19]CHENGZHANG X,DAN L.Chinese text summarization algorithm based on word2vec[J].Journal of Physics:Conference Series,2018,976(1):012006. [20]ROUANE O,BELHADEF H,BOUAKKAZ M.Word Embedding-Based Biomedical Text Summarization[C]//International Conference of Reliable Information and Communication Technology.Cham:Springer,2019:288-297. [21]YANG K,AL-SABAHI K,XIANG Y,et al.An integratedgraph model for document summarization[J].Information,2018,9(9):232. [22]ERKAN G.Using biased random walks for focused summarization[C]//Proceedings of the 2006 Document Understanding Conference held at the Human Language Technology Confe-rence of the North American Chapter of the Association for Computational Linguistics.2006. [23]OTTERBACHER J,ERKAN G,RADEV D.Using randomwalks for question-focused sentence retrieval[C]//Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.2005:915-922. [24]MAO X,YANG H,HUANG S,et al.Extractive summarization using supervised and unsupervised learning[J].Expert Systems with Applications,2019,133:173-181. [25]WAN X,YANG J,XIAO J.Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics.2007:552-559. [26]FANG C,MU D,DENG Z,et al.Word-sentence co-ranking for automatic extractive text summarization[J].Expert Systems with Applications,2017,72:189-195. [27]MAO X,HUANG S,LI R,et al.Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words[J].IEEE Access,2020,8:117528-117538. [28]REIMERS N,GUREVYCH I.Sentence-bert:Sentence embed-dings using siamese bert-networks[J].arXiv:1908.10084,2019. [29]LIN C Y.Rouge:a package for automatic evaluation of summaries[C]//Text Summarization Branches Out.2004:74-81. |
[1] | 梁静茹, 鄂海红, 宋美娜. 基于属性图模型的领域知识图谱构建方法 Method of Domain Knowledge Graph Construction Based on Property Graph Model 计算机科学, 2022, 49(2): 174-181. https://doi.org/10.11896/jsjkx.210500076 |
[2] | 陈庆超, 王韬, 尹世庄, 冯文博. 多级字典存储的未知文本协议候选关键词链式合并方法 Chain Merging Method for Unknown Text Protocol Candidate Keyword Stored in Multi-levelDictionary 计算机科学, 2020, 47(12): 332-335. https://doi.org/10.11896/jsjkx.190900116 |
[3] | 徐立. 基于加权TextRank的文本关键词提取方法 Text Keyword Extraction Method Based on Weighted TextRank 计算机科学, 2019, 46(6A): 142-145. |
[4] | 王旸, 蔡淑琴, 邹新文, 陈梓桐. 质量嵌入的大数据产品生产系统超图模型及其生产线决策研究 Quality-embedded Hypergraph Model for Big Data Product Manufacturing System and Decision for Production Lines 计算机科学, 2019, 46(2): 11-17. https://doi.org/10.11896/j.issn.1002-137X.2019.02.002 |
[5] | 王凯祥. 面向查询的自动文本摘要技术研究综述 Survey of Query-oriented Automatic Summarization Technology 计算机科学, 2018, 45(11A): 12-16. |
[6] | 杨玥,张德生. 中文文本的主题关键短语提取技术 Technology of Extracting Topical Keyphrases from Chinese Corpora 计算机科学, 2017, 44(Z11): 432-436. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.092 |
[7] | 徐慧,燕雪峰,周勇. 一种基于UML类图和活动图的故障树生成方法 Fault Tree Generation Method Based on UML Class Diagram and Activity Diagram 计算机科学, 2016, 43(7): 180-185. https://doi.org/10.11896/j.issn.1002-137X.2016.07.033 |
[8] | 陈伟鹤,刘云. 基于词或词组长度和频数的短中文文本关键词提取算法 Keyword Extraction Algorithm Based on Length and Frequency of Words or Phrases for Short Chinese Texts 计算机科学, 2016, 43(12): 50-57. https://doi.org/10.11896/j.issn.1002-137X.2016.12.009 |
[9] | 阿力甫·阿不都克里木,李晓. 基于TextRank算法和互信息相似度的维吾尔文关键词提取及文本分类 Uyghur Keyword Extraction and Text Classification Based on TextRank Algorithm and Mutual Information Similarity 计算机科学, 2016, 43(12): 36-40. https://doi.org/10.11896/j.issn.1002-137X.2016.12.006 |
[10] | 薛占熬,王朋函,刘杰,朱泰隆,薛天宇. 基于概率图的三支决策模型研究 Three-way Decision Model Based on Probabilistic Graph 计算机科学, 2016, 43(1): 30-34. https://doi.org/10.11896/j.issn.1002-137X.2016.01.007 |
[11] | 刘建伟,崔立鹏,黎海恩,罗雄麟. 概率图模型推理方法的研究进展 Research and Development on Inference Technique in Probabilistic Graphical Models 计算机科学, 2015, 42(4): 1-18. https://doi.org/10.11896/j.issn.1002-137X.2015.04.001 |
[12] | 何远舵,陈之昀,王亚沙. 一种面向浏览式购物行为模式的LBS购书移动应用 Browse-shopping-behavior-pattern-oriented Indoor LBS Mobile Application for Book Shopping 计算机科学, 2015, 42(12): 32-35. |
[13] | 王俊丽,魏绍臣,管敏. 基于图排序算法的自动文摘研究综述 Survey on Graph Model-based Document Summarization 计算机科学, 2015, 42(12): 1-7. |
[14] | 俞刚,张泉方. 一种改进的无偏节点标签预测方法研究 Improved Unbiased Node Label Prediction Algorithm 计算机科学, 2015, 42(11): 248-250. https://doi.org/10.11896/j.issn.1002-137X.2015.11.050 |
[15] | 王丽,秦小麟,许建秋. 室内概率阈值反向最近邻查询 Probabilistic Threshold Reverse Nearest Neighbor Queries for Indoor Moving Objects 计算机科学, 2015, 42(1): 201-205. https://doi.org/10.11896/j.issn.1002-137X.2015.01.045 |
|