基于非完全吸收马尔科夫链的多文档自动文摘算法

计算机科学 ›› 2013, Vol. 40 ›› Issue (5): 201-205.

基于非完全吸收马尔科夫链的多文档自动文摘算法

高晶,房俊

北方工业大学云计算研究中心北京100144;北方工业大学云计算研究中心北京100144

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然基金重点项目(61033006),国家自然基金项目(60970131)资助

Partial Absorbing Markov Chain Based Multi-document Summarization

GAO Jing and FANG Jun

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 吸收马尔科夫链模型在自动文摘领域的有效性已经证实。然而,此模型中的平均期望历经次数需要通过矩阵求逆得到,所以模型的时间复杂度很高。此外,由于自身的局限性,它也无法利用除句子间相互关系以外的其它信息。针对此问题建立了一个新的模型:非完全吸收马尔科夫链；并以此为基础提出了一个新的多文档文摘算法。证明了吸收马尔科夫链的平均期望历经次数与对应的非完全吸收马尔科夫链的稳态概率分布的等价性,而后者可通过迭代求解。同时,这个新的模型还可以引入除句子间相互关系以外的其它信息,从而生成更准确的文摘。在TAC2011上的实验证实了该模型的有效性。

关键词: 非完全吸收马尔科夫链,LexRank,面向主题的先验分布,多文档自动文摘

Abstract: Absorbing Markov Chain has been proven to be effective in text summarization．However,the algorithm based on Absorbing Markov Chain is not only time-consuming due to matrix inversion but also inept to integrate other information except relationships among sentences because of the limitation of the model．This paper presents a novel multi-document summarization approach based on Partial Absorbing Markov Chain．The equivalent relationship between the average expected visits in Absorbing Markov Chain and the stationary probability in the corresponding Partial Absorbing Markov Chain was demonstrated．Then,the stationary probability in Partial Absorbing Markov Chain which is easily calculated serves as a criterion to rank sentences．In addition,other kinds of information are incorporated together to generate a more accurate solution of the stationary probability．Experiments on TAC2011main task are performed.

Key words: Partial absorbing markov chain,LexRank,Topic-oriented prior distribution,Multi-document summarization

高晶,房俊. 基于非完全吸收马尔科夫链的多文档自动文摘算法[J]. 计算机科学, 2013, 40(5): 201-205. https://doi.org/

GAO Jing and FANG Jun. Partial Absorbing Markov Chain Based Multi-document Summarization[J]. Computer Science, 2013, 40(5): 201-205. https://doi.org/

参考文献

[1] Yu P S,Tsotras V J,Fox E A.et al.A system for query-specific document summarization[C]∥Proceedings of the 2006International Conference on Information and Knowledge Management．New York:ACM Press,2006:622-631
[2] Erkan G,Radev D．LexRank:prestige in multi-document textsummarization[C]∥Proceedings of the 2004Conference on Empirical Methods in Natural Language Processing．Morristown,NJ:ACM Press,2004:365-371
[3] Carbonell J,Goldstein J．The Use of MMR,Diversity-BasedReranking for Reordering Documents and Producing Summaries[C]∥Proceedings of the 21st ACM-SIGIR International Conferen-ce on Research and Development in Information Retrieval．New York,NY:ACM Press,1998:335-336
[4] Radev D．A common theory of information fusion from multiple text sources,step one:Cross-document structure[C]∥Procee-dings of the 1st ACL SIGDIAL Workshop on Discourse and Dialogue．Morristown,NJ:ACL,2000:74-83
[5] Zhai C X,Cohen W W,Lafferty J．Beyond independent rele-vance:Methods and evaluation metrics for subtopic retrieval[C]∥Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval．New York,NY:ACM Press,2003:10-17
[6] Zhang B Z,Li H,Liu Y,et al．Improving web search results using affinity graph[C]∥Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval．New York,NY:ACM Press,2005:504-511
[7] Zhang J,Xu H B,Cheng X Q．GSPSummary:A Graph-basedSub-topic Partition Algorithm for Summarization[C]∥Procee-dings of 2008Asia Information Retrieval Symposium．AIRS,2008:321-334
[8] Zhu X J,Goldberg A,Gael J V．et al．Improving diversity in ranking using absorbing random walks:Human Language Technologies[C]∥The Annual Conference of the North American Chapter of the Association for Computational Linguistics．Roche-ster,NY,NAACL-HLT,2007
[9] Lin C Y,Hovy E H.From Single to Multi-document Summarization:A Prototype System and its Evaluation[C]∥Proceedings of the 40th Annual Meeting of the Association for ComputationalLinguistics．Morristown,NJ:Association for Computational Linguistics,2002:25-34
[10] Harabagiu S,Lacatusu F．Topic themes for multi-documentsummarization[C]∥Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval．New York,NY:ACM Press,2005:202-209
[11] Saggion H,Bontcheva K,Cunningham H．Robust generic andquery-based summarization[C]∥Proceedings of 10th Confe-rence of the European Chapter of the Association for Computational Linguistics．Morristown,NJ:ACL,2003:235-238
[12] Conroy J M,Schlesinger J D．CLASSY query-based multi-document summarization[C]∥Proceedings of Document Understanding Conference．Vancouver,Canada,2005
[13] Page L,Brin S,Motwani R,et al．The PageRank Citation Ran-king:Bringing Order to the Web [R]．Stanford,California:Stanford University Database Group,1998
[14] Motwani R,Raghavan P.Randomized algorithms[M]．Cam-bridge Press,1996,8:33-37
[15] Haveliwala T.Efficient computation of pagerank[R]．Stanford,California:Database Group,Computer Science Department,Stanford University,1999

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed