基于语义感知的中文短文本摘要生成模型

doi:10.11896/jsjkx.190600006

摘要/Abstract

摘要： 文本摘要生成技术能够从海量数据中概括出关键信息,有效解决用户信息过载的问题。目前序列到序列模型被广泛应用于英文文本摘要生成领域,而在中文文本摘要生成领域没有对该模型进行深入研究。对于传统的序列到序列模型,解码器通过注意力机制将编码器输出的每一个词的隐藏状态作为原始文本完整的语义信息来生成摘要,但是编码器输出的每一个词的隐藏状态仅包含前、后词的语义信息,不包含原始文本完整的语义信息,导致生成摘要缺失原始文本的核心信息,影响生成摘要的准确性和可读性。为此,文中提出基于语义感知的中文短文本摘要生成模型SA-Seq2Seq,以结合注意力机制的序列到序列模型为基础,通过使用预训练模型BERT,在编码器中将中文短文本作为整体语义信息引入,使得每一个词包含整体语义信息;在解码器中将参考摘要作为目标语义信息计算语义不一致损失,以确保生成摘要的语义完整性。采用中文短文本摘要数据集LCSTS进行实验,结果表明,模型SA-Seq2Seq在评估标准ROUGE上的效果相对于基准模型有显著提高,其ROUGE-1,ROUGE-2和ROUGE-L评分在基于字符处理的数据集上分别提升了3.4%,7.1%和6.1%,在基于词语处理的数据集上分别提升了2.7%,5.4%和11.7%,即模型SA-Seq2Seq能够更有效地融合中文短文本的整体语义信息,挖掘其关键信息,确保生成摘要的流畅性和连贯性,可以应用于中文短文本摘要生成任务。

关键词: 序列到序列模型, 语义感知, 预训练模型, 中文短文本摘要, 注意力机制

Abstract: The text summary generation technology can summarize the key information from the massive data and effectively solve the problem of information overload.At present,the sequence-to-sequence model is widely used in the field of English text abstraction generation,but there is no in-depth study on this model in the field of Chinese text abstraction.In the conventional sequence-to-sequence model,the decoder applies the hidden state of each word output by the encoder as the overall semantic information through the attention mechanism,nevertheless the hidden state of each word which encoder outputs only in consideration of the front and back words of current word,which results in the generated summary missing the core information of the source text.To solve this problem,a semantic-aware based Chinese short text summarization generation model called SA-Seq2Seq is proposed,which uses the sequence-to-sequence model with attention mechanism.The model SA-Seq2Seq applies the pre-training model called BERT to introduce source text in the encoder so that each word contains the overall semantic information and uses gold summary as the target semantic information in the decoder to calculate the semantic inconsistency loss,thus ensuring the semantic integrity of the generated summary.Experiments are carried out on the dataset using the Chinese short text summary dataset LCSTS.The experimental results show that the model SA-Seq2Seq on the evaluation metric ROUGE is significantly improved compared to the benchmark model,and its ROUGE-1,ROUGE-2 and ROUGE-L scores increase by 3.4%,7.1% and 6.1% respectively in the dataset that is processed based on character and increase by 2.7%,5.4% and 11.7% respectively in the dataset that is processed based on word.So the SA-Seq2Seq model can effectively integrate Chinese short text and ensure the fluency and consistency of the generated summary,which can be applied to the Chinese short text summary generation task.

Key words: Attention mechanism, Chinese short text summarization, Pre-training model, Semantic aware, Sequence to sequence model

中图分类号:

TP391.1

倪海清, 刘丹, 史梦雨. 基于语义感知的中文短文本摘要生成模型[J]. 计算机科学, 2020, 47(6): 74-78. https://doi.org/10.11896/jsjkx.190600006

NI Hai-qing, LIU Dan, SHI Meng-yu. Chinese Short Text Summarization Generation Model Based on Semantic-aware[J]. Computer Science, 2020, 47(6): 74-78. https://doi.org/10.11896/jsjkx.190600006

参考文献

[1]NETO J L,FREITAS A A,KAESTNER C A A.Automatic Text Summarization Using a Machine Learning Approach[C]//16th Brazilian Symposium on Artificial Intelligence.2002:11-14.
[2]CHOPRA S,AULI M,RUSH A M.Abstractive Sentence Summarization with Attentive Recurrent Neural Networks[C]//Conference of the North American Chapter of the Association for Computational Linguistics.2016:93-98.
[3]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[C]//Advances in Neural Information Processing Systems 27.2014:3104-3112.
[4]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[C]//3rd International Conference on Learning Representations.2015.
[5]LUONG M T,PHAM H,MANNING C D.Effective Approaches to Attention-based Neural Machine Translation[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1412-1421.
[6]PANG C,YIN C H.Chinese Text Summarization Based on Classification[J].Computer Science,2018,45(1):144-147,178.
[7]WU R S,WANG H L,WANG Z Q, et al.Short Text Summary Generation with Global Self-Matching Mechanism[J/OL].Journal of Software.https://doi.org/10.13328/j.cnki.jos.005850.
[8]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed Representations of Words and Phrases and their Compositiona-lity[C]//27th Annual Conference on Neural Information Processing Systems 2013.2013:3111-3119.
[9]PETERS M E,NEUMANN M,IYYER M,et al.DeepContextuaized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:2227-2237.
[10]RADFORD A,NARASIMHAN K,et al.Improving language understanding by generative pre-training[EB/OL].https://s3-us-west-2.amazonaws.com/openaiassets/research-covers/language-unsupervised/languageunderstandingpaper.pdf.
[11]RADFORD A,WU J,et al.Language Models are Unsupervised Multitask Learners[EB/OL].https://blog.openai.com/better-language-models/.
[12]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].CoRR,2018,abs/1810.04805.
[13]KRYS＇CINSKI W,PAULUS R,XIONG C,et al.Improving Abstraction in Text Summarization[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proces-sing.2018:1808-1817.
[14]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[15]CHO K,VAN MERRIENBOER B,BAHDANAU D,et al.On the Properties of Neural Machine Translation:Encoder--Decoder Approaches[C]// Proceedings of SSST@EMNLP 2014.2014:103-111.
[16]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is All you Need[C]//Advances in Neural Information Processing Systems.2017:6000-6010.
[17]WILLIAMS R J,ZIPSER D.A learning algorithm for continually running fully recurrent neural networks[J].Neural Computation,1989,1(2):270-280.
[18]HU B,CHEN Q,ZHU F,et al.LCSTS:A Large Scale Chinese Short Text Summarization Dataset[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1967-1972.
[19]LIN C.ROUGE:A Package for Automatic Evaluation of Summaries[C]//Text Summarization Branches Out:Proceedings of the ACL-04 Workshop.2004:74-81.
[20]KINGMA D P,BA J.Adam:A method for stochastic optimization[C]//3rd International Conference on Learning Representations.2014.
[21]NETO J L,FREITAS A A,KAESTNER C A,et al.Automatic Text Summarization Using a Machine Learning Approach[C]//Brazilian Symposium on Artificial Intelligence.2002:205-215.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[4]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[8]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[12]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[13]	熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[14]	彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093
[15]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed