计算机科学 ›› 2010, Vol. 37 ›› Issue (5): 151-154.

• 数据库与数据挖掘 • 上一篇    下一篇

一种基于LexRank算法的改进的自动文摘系统

纪文倩,李舟军,巢文涵,陈小明   

  1. (北京航空航天大学计算机学院 北京100191)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(60573057,60473057,90604007)资助。

Automatic Abstracting System Based on Improved LexRank Algorithm

JI Wen-qian,LI Zhou-jun,CHAO Wen-han,CHEN Xiao-ming   

  • Online:2018-12-01 Published:2018-12-01

摘要: 自动文摘是计算机语言学领域的一个研究重点,其研究和应用受到了计算机科学、语言学、情报信息学等相关学科的广泛关注。首先介绍了基于LexRank算法的自动文摘方法。针对该方法的不足,从句子相似度计算方法、句子权重计算方法以及冗余处理等方面对它进行了改进,从而可以根据输入文本内容动态地调整相关影响因子。实现的文摘系统,可以对中文和英文的单文本或多文本进行自动文摘。在哈工大和DUC的测评语料上进行了实验,结果表明该系统在一定程度上改进了文摘的质量,在多文本文摘中的杭噪声方面也有一定的优越性。最后讨论了自动摘要研究存在的问题,并指出了自动文摘的研究趋势。

关键词: 自动文摘,LexRank,句子相似度,动态调整,冗余处理

Abstract: Automatic abstracting has been a priority research point in computational linguistics field, and the study and application of automatic summarization have widely attracted the attention of interrelated academic subjects such as computer science, linguistics, informatics. I}his article firstly brought out how LexRank algorithm works in automatic summarization, then improved the method in three aspects including sentence similarity computing, sentence weight computing and redundancy resolution. And the factors of influence could be dynamically adjusted according to the documents content. The system described in this article could deal with single or multi-document summarization both in English and Chinese. With evaluations on two corpuses, our methods could produce better summaries than the original LexRank algorithm to a certain degree. We also show that our system is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. And in the end, existing problem and the developing trend of automatic summarization technology were discussed.

Key words: Automatic abstracting, LexRank, Sentence similarity, Dynamic adjustment, Redundancy resolution

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!