计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 59-66.doi: 10.11896/jsjkx.200900180
付颖, 王红玲, 王中卿
FU Ying, WANG Hong-ling, WANG Zhong-qing
摘要: 为科技论文生成自动摘要,这能够帮助作者更快撰写摘要,是自动文摘的研究内容之一。相比于常见的新闻文档,科技论文具有文档结构性强、逻辑关系明确等特点。目前,主流的编码-解码的生成式文摘模型主要考虑文档的序列化信息,很少深入探究文档的篇章结构信息。为此,文中针对科技论文的特点,提出了一种基于“单词-章节-文档”层次结构的自动摘要模型,利用单词与章节的关联作用增强文本结构的层次性和层级之间的交互性,从而筛选出科技论文的关键信息。除此之外,该模型还扩充了一个上下文门控单元,旨在更新优化上下文向量,从而能更全面地捕获上下文信息。实验结果表明,提出的模型可有效提高生成文摘在ROUGE评测方法上的各项指标性能。
中图分类号:
[1]YU H.Standard editing of “purpose” elements in abstracts of scientific papers[J].Journal of Liaoning Teachers College (Natu-ral Science Edition),2020,22,85(1):110-112. [2]ZHANG Y,WANG Z Q,WANG H L.Research on single document extraction summarization method based on the relationship between primary and secondary text[J].Chinese Journal of information technology,2019,33(8):67-76. [3]NALLAPATI R,ZHOU B,GULCEHRE C,et al.Abstractive text summarization using sequence-to-sequence rnns and beyond[J].arXiv:1602.06023,2016. [4]XU Y,LAU J H,BALDWIN T,et al.Decoupling encoder and decoder networks for abstractive document summarization[C]//Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres.2017:7-11. [5]CHO K,VAN MERRIËNBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014. [6]XU H,HE Y,HAN K,et al.Learning Syntactic and Dynamic Selective Encoding for Document Summarization[C]//2019 International Joint Conference on Neural Networks (IJCNN).IEEE,2019:1-8. [7]HERMANN K M,KOCISKY T,GREFENSTETTE E,et al.Teaching machines to read and comprehend[C]//Advances in Neural Information Processing Systems.2015:1693-1701. [8]XU F,ZHU Q M,ZHOU G D.Review of text analysis technology[J].Chinese Journal of Information Technology,2013,27(3):20-33. [9]TEUFEL S,MOENS M.Summarizing scientific articles:experiments with relevance and rhetorical status[J].Computational Linguistics,2002,28(4):409-445. [10]COLLINS E,AUGENSTEIN I,RIEDEL S.A supervised ap-proach to extractive summarisation of scientific papers[J].ar-Xiv:1706.03946,2017. [11]FORMAN G.BNS feature scaling:an improved representation over tf-idf for svm text classification[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management.2008:263-270. [12]XIAO W,CARENINI G.Extractive summarization of long do-cuments by combining global and local context[J].arXiv:1909.08089,2019. [13]KIM M,SINGH M D,LEE M.Towards abstraction from extraction:multiple timescale gated recurrent unit for summarization[J].arXiv:1607.00718,2016. [14]COHAN A,DERNONCOURT F,KIM D S,et al.A discourse-aware attention model for abstractive summarization of long do-cuments[J].arXiv:1804.05685,2018. [15]LIU K,WANG H L.Coherence of Automatic SummarizationBased on Discourse Rhetoric Structure[J].Chinese Journal of Information Technology,2019,33(1):77-84. [16]WU R S,ZHANG Y F,WANG H L,et al.Generative Automa-tic Summarization Based on Hierarchical Structure[J].Chinese Journal of Information Technology,2019,33 (10):90-98. [17]SEE A,LIU P J,MANNING C D.Get to the point:Summarization with pointer-generator networks[J].arXiv:1704.04368,2017. [18]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [19]ZHOU P,SHI W,TIAN J,et al.Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:207-212. [20]LIN J,SUN X,MA S,et al.Global encoding for abstractivesummarization[J].arXiv:1805.03989,2018. [21]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105. [22]NAIR V,HINTON G E.Rectified linear units improve restric-ted boltzmann machines[C]//ICML.2010. [23]WANG K,QUAN X,WANG R.Biset:Bi-directional selective encoding with template for abstractive summarization[J].ar-Xiv:1906.05012,2019. [24]LIN C Y,GAO J,CAO G,et al.Automatic evaluation of summaries:U.S.Patent 7,725,442[P].2010-5-25. [25]COHAN A,GOHARIAN N.Scientific article summarizationusing citation-context and article's discourse structure[J].ar-Xiv:1704.06619,2017. |
[1] | 李健智, 王红玲, 王中卿. 基于图卷积网络的专利摘要自动生成研究 Automatic Generation of Patent Summarization Based on Graph Convolution Network 计算机科学, 2022, 49(6A): 172-177. https://doi.org/10.11896/jsjkx.210400117 |
[2] | 俞亮, 魏永丰, 罗国亮, 邬昌兴. 基于知识蒸馏的隐式篇章关系识别 Knowledge Distillation Based Implicit Discourse Relation Recognition 计算机科学, 2021, 48(11): 319-326. https://doi.org/10.11896/jsjkx.201000099 |
[3] | 张宜飞,王中卿,王红玲. 基于篇章层次结构的商品评论摘要 Product Review Summarization Using Discourse Hierarchical Structure 计算机科学, 2020, 47(2): 195-200. https://doi.org/10.11896/jsjkx.181202410 |
[4] | 周明,贾艳明,周彩兰,徐宁. 基于篇章结构的英文作文自动评分方法 English Automated Essay Scoring Methods Based on Discourse Structure 计算机科学, 2019, 46(3): 234-241. https://doi.org/10.11896/j.issn.1002-137X.2019.03.035 |
[5] | 余珊珊,苏锦钿,李鹏飞. 基于改进的TextRank的自动摘要提取方法 Improved TextRank-based Method for Automatic Summarization 计算机科学, 2016, 43(6): 240-247. https://doi.org/10.11896/j.issn.1002-137X.2016.06.048 |
[6] | 郭峰,乔磊,毛文祥. 层次结构的进程网 Hierarchy Structure of Process Net 计算机科学, 2016, 43(11): 83-87. https://doi.org/10.11896/j.issn.1002-137X.2016.11.015 |
[7] | 王俊丽,魏绍臣,管敏. 基于图排序算法的自动文摘研究综述 Survey on Graph Model-based Document Summarization 计算机科学, 2015, 42(12): 1-7. |
[8] | 张世红,秦浩. 基于地市级数据集市的结构与模块设计 Designs of Structures and Modules Based on Local Data Marts 计算机科学, 2013, 40(Z11): 281-283. |
[9] | 谢浩,孙伟. 基于段落-句子互增强的自动文摘算法 Paragraph-Sentence Mutual Reinforcement Based Automatic Summarization Algorithm 计算机科学, 2013, 40(Z11): 246-250. |
[10] | 高晶,房俊. 基于非完全吸收马尔科夫链的多文档自动文摘算法 Partial Absorbing Markov Chain Based Multi-document Summarization 计算机科学, 2013, 40(5): 201-205. |
[11] | 葛斌,李芳芳,李阜,肖卫东. 基于无向图构建策略的主题句抽取 Subject Sentence Extraction Based on Undirected Graph Construction 计算机科学, 2011, 38(5): 181-185. |
[12] | 纪文倩,李舟军,巢文涵,陈小明. 一种基于LexRank算法的改进的自动文摘系统 Automatic Abstracting System Based on Improved LexRank Algorithm 计算机科学, 2010, 37(5): 151-154. |
[13] | . 基于形式概念分析的领域本体构建方法研究 计算机科学, 2006, 33(1): 210-212. |
[14] | 杨艺 青宏虹 何光辉. 城市消防预警系统的模糊综合评价方法研究 计算机科学, 2005, 32(5): 246-248. |
[15] | 李刚 仲元昌 韩逢庆 王越. 基于MAS的企业GDSS设计 计算机科学, 2005, 32(2): 199-201. |
|