计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 59-66.doi: 10.11896/jsjkx.200900180

• 人工智能* 上一篇    下一篇

基于单词-章节关联的科技论文摘要

付颖, 王红玲, 王中卿   

  1. 苏州大学计算机科学与技术学院 江苏 苏州215006
  • 收稿日期:2020-09-24 修回日期:2021-01-04 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 王红玲(hlwang@suda.edu.cn)
  • 作者简介:20184227030@stu.suda.edu.cn
  • 基金资助:
    国家自然科学基金(61976146)

Scientific Paper Summarization Using Word-Section Association

FU Ying, WANG Hong-ling, WANG Zhong-qing   

  1. School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
  • Received:2020-09-24 Revised:2021-01-04 Online:2021-10-15 Published:2021-10-18
  • About author:FU Ying,born in 1994,postgraduate,is a member of China Computer Federation.Her main research interests include natural language processing and so on.
    WANG Hong-ling,born in 1975,professor.Her main research interests include natural language processing and so on.
  • Supported by:
    National Natural Science Foundation of China(61976146).

摘要: 为科技论文生成自动摘要,这能够帮助作者更快撰写摘要,是自动文摘的研究内容之一。相比于常见的新闻文档,科技论文具有文档结构性强、逻辑关系明确等特点。目前,主流的编码-解码的生成式文摘模型主要考虑文档的序列化信息,很少深入探究文档的篇章结构信息。为此,文中针对科技论文的特点,提出了一种基于“单词-章节-文档”层次结构的自动摘要模型,利用单词与章节的关联作用增强文本结构的层次性和层级之间的交互性,从而筛选出科技论文的关键信息。除此之外,该模型还扩充了一个上下文门控单元,旨在更新优化上下文向量,从而能更全面地捕获上下文信息。实验结果表明,提出的模型可有效提高生成文摘在ROUGE评测方法上的各项指标性能。

关键词: 层次结构, 科技论文摘要, 篇章结构, 生成式文摘, 自动文摘

Abstract: With the development of science and technology,people need to access a large number of scientific and technological information quickly,and scientific paper is one of the main ways to carry scientific and technological information.As an important part of scientific paper,abstract is an effective tool for readers to retrieve literature.Therefore,the quality of abstract affects the retrieval rate of paper directly.However,due to the lack of writing experience,the quality of abstracts written by many authors is not high.Automatic generation of summary for scientific paper can help the author grasp the important content of paper more effectively,so as to write high-quality abstract.At the same time,the automatically generated abstract can also control the number of words in the abstract,which can bring more content to readers and help them understand the paper better.Generating automa-tic summarization for scientific paper can help author write abstract faster,which is one of the research contents in automatic summarization.Compared with common news document,scientific paper has the characteristics of strong structure and clear logical relationship.As far as the mainstream abstractive summarization such as encoder-decoder model is concerned,it mainly consi-ders the serialized information in the document,and rarely explores the text structure information in the document.For this reason,according to the characteristics in scientific papers,this paper proposes an automatic summarization model based on the hie-rarchical structure of “word-section-document”,which uses the association between word and section to enhance the level of text structure and the interaction between levels,so as to screen out the key information in scientific paper.In addition,a context gate unit is extended to update the optimized context vector,thus capturing context information more comprehensively.The experimental results show that the proposed model can effectively improve the performance of the generated summarization in the ROUGE evaluation method.

Key words: Abstractive summarization, Automatic summarization, Hierarchical structure, Scientific paper summarization, Text structure

中图分类号: 

  • TP18
[1]YU H.Standard editing of “purpose” elements in abstracts of scientific papers[J].Journal of Liaoning Teachers College (Natu-ral Science Edition),2020,22,85(1):110-112.
[2]ZHANG Y,WANG Z Q,WANG H L.Research on single document extraction summarization method based on the relationship between primary and secondary text[J].Chinese Journal of information technology,2019,33(8):67-76.
[3]NALLAPATI R,ZHOU B,GULCEHRE C,et al.Abstractive text summarization using sequence-to-sequence rnns and beyond[J].arXiv:1602.06023,2016.
[4]XU Y,LAU J H,BALDWIN T,et al.Decoupling encoder and decoder networks for abstractive document summarization[C]//Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres.2017:7-11.
[5]CHO K,VAN MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[6]XU H,HE Y,HAN K,et al.Learning Syntactic and Dynamic Selective Encoding for Document Summarization[C]//2019 International Joint Conference on Neural Networks (IJCNN).IEEE,2019:1-8.
[7]HERMANN K M,KOCISKY T,GREFENSTETTE E,et al.Teaching machines to read and comprehend[C]//Advances in Neural Information Processing Systems.2015:1693-1701.
[8]XU F,ZHU Q M,ZHOU G D.Review of text analysis technology[J].Chinese Journal of Information Technology,2013,27(3):20-33.
[9]TEUFEL S,MOENS M.Summarizing scientific articles:experiments with relevance and rhetorical status[J].Computational Linguistics,2002,28(4):409-445.
[10]COLLINS E,AUGENSTEIN I,RIEDEL S.A supervised ap-proach to extractive summarisation of scientific papers[J].ar-Xiv:1706.03946,2017.
[11]FORMAN G.BNS feature scaling:an improved representation over tf-idf for svm text classification[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management.2008:263-270.
[12]XIAO W,CARENINI G.Extractive summarization of long do-cuments by combining global and local context[J].arXiv:1909.08089,2019.
[13]KIM M,SINGH M D,LEE M.Towards abstraction from extraction:multiple timescale gated recurrent unit for summarization[J].arXiv:1607.00718,2016.
[14]COHAN A,DERNONCOURT F,KIM D S,et al.A discourse-aware attention model for abstractive summarization of long do-cuments[J].arXiv:1804.05685,2018.
[15]LIU K,WANG H L.Coherence of Automatic SummarizationBased on Discourse Rhetoric Structure[J].Chinese Journal of Information Technology,2019,33(1):77-84.
[16]WU R S,ZHANG Y F,WANG H L,et al.Generative Automa-tic Summarization Based on Hierarchical Structure[J].Chinese Journal of Information Technology,2019,33 (10):90-98.
[17]SEE A,LIU P J,MANNING C D.Get to the point:Summarization with pointer-generator networks[J].arXiv:1704.04368,2017.
[18]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[19]ZHOU P,SHI W,TIAN J,et al.Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:207-212.
[20]LIN J,SUN X,MA S,et al.Global encoding for abstractivesummarization[J].arXiv:1805.03989,2018.
[21]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105.
[22]NAIR V,HINTON G E.Rectified linear units improve restric-ted boltzmann machines[C]//ICML.2010.
[23]WANG K,QUAN X,WANG R.Biset:Bi-directional selective encoding with template for abstractive summarization[J].ar-Xiv:1906.05012,2019.
[24]LIN C Y,GAO J,CAO G,et al.Automatic evaluation of summaries:U.S.Patent 7,725,442[P].2010-5-25.
[25]COHAN A,GOHARIAN N.Scientific article summarizationusing citation-context and article's discourse structure[J].ar-Xiv:1704.06619,2017.
[1] 李健智, 王红玲, 王中卿.
基于图卷积网络的专利摘要自动生成研究
Automatic Generation of Patent Summarization Based on Graph Convolution Network
计算机科学, 2022, 49(6A): 172-177. https://doi.org/10.11896/jsjkx.210400117
[2] 俞亮, 魏永丰, 罗国亮, 邬昌兴.
基于知识蒸馏的隐式篇章关系识别
Knowledge Distillation Based Implicit Discourse Relation Recognition
计算机科学, 2021, 48(11): 319-326. https://doi.org/10.11896/jsjkx.201000099
[3] 张宜飞,王中卿,王红玲.
基于篇章层次结构的商品评论摘要
Product Review Summarization Using Discourse Hierarchical Structure
计算机科学, 2020, 47(2): 195-200. https://doi.org/10.11896/jsjkx.181202410
[4] 周明,贾艳明,周彩兰,徐宁.
基于篇章结构的英文作文自动评分方法
English Automated Essay Scoring Methods Based on Discourse Structure
计算机科学, 2019, 46(3): 234-241. https://doi.org/10.11896/j.issn.1002-137X.2019.03.035
[5] 余珊珊,苏锦钿,李鹏飞.
基于改进的TextRank的自动摘要提取方法
Improved TextRank-based Method for Automatic Summarization
计算机科学, 2016, 43(6): 240-247. https://doi.org/10.11896/j.issn.1002-137X.2016.06.048
[6] 郭峰,乔磊,毛文祥.
层次结构的进程网
Hierarchy Structure of Process Net
计算机科学, 2016, 43(11): 83-87. https://doi.org/10.11896/j.issn.1002-137X.2016.11.015
[7] 王俊丽,魏绍臣,管敏.
基于图排序算法的自动文摘研究综述
Survey on Graph Model-based Document Summarization
计算机科学, 2015, 42(12): 1-7.
[8] 张世红,秦浩.
基于地市级数据集市的结构与模块设计
Designs of Structures and Modules Based on Local Data Marts
计算机科学, 2013, 40(Z11): 281-283.
[9] 谢浩,孙伟.
基于段落-句子互增强的自动文摘算法
Paragraph-Sentence Mutual Reinforcement Based Automatic Summarization Algorithm
计算机科学, 2013, 40(Z11): 246-250.
[10] 高晶,房俊.
基于非完全吸收马尔科夫链的多文档自动文摘算法
Partial Absorbing Markov Chain Based Multi-document Summarization
计算机科学, 2013, 40(5): 201-205.
[11] 葛斌,李芳芳,李阜,肖卫东.
基于无向图构建策略的主题句抽取
Subject Sentence Extraction Based on Undirected Graph Construction
计算机科学, 2011, 38(5): 181-185.
[12] 纪文倩,李舟军,巢文涵,陈小明.
一种基于LexRank算法的改进的自动文摘系统
Automatic Abstracting System Based on Improved LexRank Algorithm
计算机科学, 2010, 37(5): 151-154.
[13] .
基于形式概念分析的领域本体构建方法研究

计算机科学, 2006, 33(1): 210-212.
[14] 杨艺 青宏虹 何光辉.
城市消防预警系统的模糊综合评价方法研究

计算机科学, 2005, 32(5): 246-248.
[15] 李刚 仲元昌 韩逢庆 王越.
基于MAS的企业GDSS设计

计算机科学, 2005, 32(2): 199-201.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!