计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 226-229.doi: 10.11896/j.issn.1002-137X.2015.04.046

• 人工智能 • 上一篇    下一篇

基于综合的句子特征的文本自动摘要

程 园,吾守尔·斯拉木,买买提依明·哈斯木   

  1. 新疆大学信息科学与工程学院新疆多语种信息技术实验室 乌鲁木齐830046,新疆大学信息科学与工程学院新疆多语种信息技术实验室 乌鲁木齐830046,新疆大学信息科学与工程学院新疆多语种信息技术实验室 乌鲁木齐830046;和田师范专科学校计算机科学系 和田848000
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家973重点基础研究发展计划资金项目(2014CB340506)资助

Automatic Text Summarization Based on Comprehensive Characteristics of Sentence

CHENG Yuan, Wushouer SILAMU and Maimaitiyiming HASIMUA   

  • Online:2018-11-14 Published:2018-11-14

摘要: 采用了一种综合的文本自动摘要方法来抽取出涵盖范围广、冗余信息少、最能反映文本中心思想的文本摘要。该方法充分考虑文本中的词频、标题、句子位置、线索词、提示性短语、句子相似度等特征因素,构建了一个综合的特征加权函数,运用数学回归模型对语料进行训练,去除冗余句子信息,提取关键句生成摘要。实验评估表明了该方法的可行性、有效性以及在摘要质量方面的优越性。

关键词: 自动摘要,特征因素,综合,加权函数

Abstract: To extract the abstract with less redundant information and a wide coverage,which can reflect the main idea of the text,this paper advanced a comprehensive text summarization method.This method takes the frequency of the words,the title,the position of the sentence in the text,cue phrases,similarity of the sentences and other features in the text into consideration,constructs a comprehensive feature weighting function,trains the corpus with mathematical regression model,removes the redundant information,and then gets the abstract.The experiment shows that this method is very effective and feasible,and very superior in the quality of the extraction.

Key words: Automatic abstract,Features,Comprehensive,Weighting function

[1] Luhn H P.The automatic creation of literature abstract[J].IBM Journal of Research and Development,1958,2(2):159-165
[2] Edmundson H P.New methods in automatic extracting[J].Journal of the ACM (JACM),1969,6(2):264-285
[3] Erkan G,Radev D R.LexRank:Graph-based lexical centrality as salience in text summarization[J].J.Artif.Intell.Res.(JAIR),2004,22(1):457-479
[4] Antiqueira L,Oliveira Jr O N,Costa L F,et al.A complex net-work approach to text summarization[J].Information Sciences,2009,179(5):584-599
[5] 王永成,许慧敏.OA中文文献自动摘要系统[J].情报学报,1997,6(2):128-132
[6] 吴岩,李秀坤,王开铸.HIT-971型英文自动文摘系统[J].情报学报,1998,7(5):358-364
[7] 蒋昌金,彭宏,王开铸.基于主题词权重和句子特征的自动文摘术[J].华南理工大学学报,2010,38(7):50-54
[8] 刘功中,李建华,李生红.基于类信息的特征选择和加权方法[C]∥第一届全国信息检索与内容安全学术会议.2004
[9] Salton G,Lesk M E.Computer evaluation of indexing and text processing [J].Journal of the ACM,1968,15(1):8-36
[10] Machine B E.Made index for technical literature an experiment[J].IBM Journal of Research and Development,1958,12(4):354-361
[11] 张志昌,张宇,刘挺,等.基于线索词识别和训练集扩展的中文问题分类[J].高技术通讯,2009,19(2):111-118
[12] 纪文倩,李舟军,巢文涵,等.一种基于 LexRank 算法的改进的自动文摘系统[J].计算机科学,2010,37(5):151-154
[13] Ozsoy M G,Alpaslan F N,Cicekli I.Text summarization using latent semantic analysis[J].Journal of Information Science,2011,37(4):405-417

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!