计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 301-304.doi: 10.11896/jsjkx.210600166

• 人工智能 • 上一篇    下一篇

基于GAN和中文词汇网的文本摘要技术

刘晓影, 王淮, 乌吉斯古愣   

  1. 华北计算技术研究所网络安全工作组 北京100083
  • 收稿日期:2021-06-21 修回日期:2021-09-14 发布日期:2022-12-14
  • 通讯作者: 王淮(498929906@qq.com)
  • 作者简介:(xiaoying81@126.com)
  • 基金资助:
    国家重点研发计划(2018YFC0831200)

GAN and Chinese WordNet Based Text Summarization Technology

LIU Xiao-ying, WANG Huai, WU Jisiguleng   

  1. Network Security Group,North China Institute of Computing Technology,Beijing 100083,China
  • Received:2021-06-21 Revised:2021-09-14 Published:2022-12-14
  • About author:LIU Xiao-ying,born in 1981,Ph.D,se-nior engineer,is a member of China Computer Federation.Her main research interests include natural language processing,artificial intelligence and network security.WANG Huai,born in 1996,master,engineer.His main research interests include network security,threat intelligence analysis and knowledge graph.
  • Supported by:
    National Key R & D Program of China(2018YFC0831200).

摘要: 随着神经网络技术的广泛应用,文本摘要技术吸引了越来越多科研人员的注意。由于生成式对抗网络(GANs)具有提取文本特征或学习整个样本的分布并以此产生相关样本点的能力,因此正逐步取代传统基于序列到序列(Seq2seq)的模型,被用于提取文本摘要。利用生成式对抗网络的特点,将其用于生成式的文本摘要任务。提出的生成式对抗模型由3部分组成:一个生成器,将输入的句子编码为更短的文本表示向量;一个可读性判别器,强制生成器生成高可读性的文本摘要;以及一个相似性判别器,作用于生成器,抑制其输出的文本摘要与输入的摘要之间的不相关性。此外,在相似性判别器中,引用中文的WordNet作为外部知识库来增强判别器的作用。生成器使用策略梯度算法进行优化,将问题转化为强化学习。实验结果表明,所提模型得到了较高的ROUGE评测分数。

关键词: 文本摘要, 生成式对抗网络, WordNet, 强化学习, 自然语言处理

Abstract: Since the introduction of neural networks,text summarization techniques continue to attract the attention of resear-chers.Similarly,generative adversarial networks(GANs)can be used for text summarization because they can generate text features or learn the distribution of the entire sample and produce correlated sample points.In this paper,we exploit the features of generative adversarial networks(GANs)and use them for abstractive text summarization tasks.The proposed generative adversa-rial model has three components:a generator,which encodes the input sentences into shorter representations;a readability discriminator,which forces the generator to create comprehensible summaries;and a similarity discriminator,which acts on the generator to curb the discorrelation between the outputted text summarization and the inputted text summarization.In addition,Chinese WordNet is used as an external knowledge base in the similarity discriminator to enhance the discriminator.The generator is optimized using policy gradient algorithm,converting the problem into reinforcement learning.Experimental results show that the proposed model gets high ROUGE evaluation scores.

Key words: Text summarization, Generative adversaial network, WordNet, Reinforcement learning, Natural language processing

中图分类号: 

  • TP391
[1]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Proceedings of the Workshop on Text Summarization Branches Out(WAS 2004).2004.
[2]RUSH A M,CHOPRA S,WESTON J.A neural attention model for abstractive sentence summarization[J].arXiv:1509.00685,2015.
[3]RAMESH N,ZHAI F F,ZHOU B W.Summarunner:A recurrent neural network based sequence model for extractive summarization of documents[C]//Proceedings of the AAAIConfe-rence on Artificial Intelligence.2017.
[4]ILYA S,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[J].arXiv:1409.3215,2014.
[5]RAMACHANDRAN P,LIU P J,LE Q V.Unsupervised pretraining for sequence to sequence learning[J].arXiv:1611.02683,2016.
[6]SEE A,LIU P J,MANNING C D.Get to the point:Summarization with pointer-generator networks[J].arXiv:1704.04368,2017.
[7]BANAFSHEH R,MOUSAS C,GUPTA B.Generative adver-sarial network with policy gradient for text summarization[C]//2019 IEEE 13th International Conference on Semantic Computing(ICSC).IEEE,2019.
[8]WANG Y S,LEE H Y.Learning to encode text as human-rea-able summaries using generative adversarial networks[J].ar-Xiv:1810.02851,2018.
[9]ZHUANG H J,ZHANG W B.Generating semantically similar and human-readable summaries with generative adversarial networks[J].IEEE Access,2019,7:169426-169433.
[10]YU L T,ZHANG W L,WANG J,et al.Seqgan:Sequence gene-rative adversarial nets with policy gradient[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2017.
[11]SUTTON R S,MCALLESTER D,SINGH S,et al.Policy gra-dient methods for reinforcement learning with function approximation[J].Advances in Neural Information Processing Systems,2000,12:1057-1063.
[12]ZHANG H Y,XU J J,WANG J.Pretraining-based natural language generation for text summarization[J].arXiv:1902.09243,2019.
[13]SARKAR K.Bengali text summarization by sentence extraction[J].arXiv:1201.2240,2012.
[14]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[4] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[5] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[6] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[7] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[8] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[9] 李小伟, 舒辉, 光焱, 翟懿, 杨资集.
自然语言处理在简历分析中的应用研究综述
Survey of the Application of Natural Language Processing for Resume Analysis
计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134
[10] 徐国宁, 陈奕芃, 陈一鸣, 陈晋音, 温浩.
基于约束优化生成式对抗网络的数据去偏方法
Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks
计算机科学, 2022, 49(6A): 184-190. https://doi.org/10.11896/jsjkx.210400234
[11] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[12] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[13] 郭雨欣, 陈秀宏.
融合BERT词嵌入表示和主题信息增强的自动摘要模型
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[14] 范静宇, 刘全.
基于随机加权三重Q学习的异策略最大熵强化学习算法
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[15] 张佳能, 李辉, 吴昊霖, 王壮.
一种平衡探索和利用的优先经验回放方法
Exploration and Exploitation Balanced Experience Replay
计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!