计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 210300179-7.doi: 10.11896/jsjkx.210300179

• 人工智能 • 上一篇    下一篇

基于知识蒸馏的抽取式自动摘要模型

赵江江1, 王洋2, 许楹楹1, 高扬2   

  1. 1 中移在线服务有限公司 北京 100033;
    2 北京理工大学计算机学院 北京 100081
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 高扬(gyang@bit.edu.cn)
  • 作者简介:(zhaojiangjiang@cmos.chinamobile.com)

Extractive Automatic Summarization Model Based on Knowledge Distillation

ZHAO Jiangjiang1, WANG Yang2, XU Yingying1, GAO Yang2   

  1. 1 China Mobile Online Services Company Limited,Beijing 100033,China;
    2 School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZHAO Jiangjiang,born in 1987,Ph.D candidate,is a member of China Computer Federation.His main research interests include open domian dialog system,information extraction and network representation learning and natural language processing. GAO Yang,born in 1987,Ph.D,asso-ciate professor,is a member of China Computer Federation.Her main research interests include information extraction,network representation lear-ning and natural language processing.

摘要: 抽取式自动摘要任务的目标是通过抽取原文中重要的句子来构成简短的摘要,同时保留原文中重要的内容。查询导向的抽取式摘要模型则可以进一步满足用户对摘要内容的不同需求。抽取式摘要模型具有能保证摘要内容正确性和句子可读性的天然优势,在此基础上确保摘要内容的相关性和显著性则成为了模型摘要目标的关键。为了实现抽取式摘要模型既满足查询的相关性又能保证摘要内容的显著性的目的,将查询信息作为模型学习的目标,利用摘要数据集的标题和图片信息额外构建了基于查询的扩展摘要数据集,并结合知识蒸馏方法提出了基于知识蒸馏的抽取式摘要模型。在实验中采用预训练语言模型BERT作为编码器,并结合知识蒸馏理论提出了两种模型训练策略:引导训练和蒸馏训练。在公开的新闻摘要数据集CNN/DailyMail上的实验结果证明,两种训练方法都取得了显著的效果。通过实验还发现,基于引导训练的摘要模型可以有效提高摘要内容的显著性,同时基于蒸馏训练的模型在提高摘要相关性和显著性方面达到了最好的效果。

关键词: 查询导向抽取式摘要, 扩展数据集, BERT, 知识蒸馏

Abstract: The objective of the extractive summarization is to extract the important sentences from the original text to form a short summary while retaining the important content of the original text.Query-focused extractive summary model can further satisfy users’ different needs for summary content.Extractive summary model has the natural advantage of ensuring the correctness of summary and the readability of sentences.On this basis,ensuring the relevance and importance of the summary content is the key to the goal of the model.In order to satisfy the relevance of query and ensure the importance of summary content,this paper uses query information as a model study target,creates an extended summary data set based on the title and picture information,an extractive summary model based on knowledge distillation is proposed.In experiments,the pre-training language model BERT is adopted as the encoder,two model training strategies based on knowledge distillation theory are proposed:guided trai-ning and distillation training.Experimental results on CNN/DailyMail,a publicly available data set of news summaries,show that both training methods have achieved significant effects.It is also found that the model based on guiding training could effectively improve the significance of the summary content,while the model based on distillation training achieves the best effect in improving the relevance and significance of the summary.

Key words: Query-focused extractive summarization, Extended data set, BERT, Knowledge distillation

中图分类号: 

  • TP391
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186.
[3]FURLANELLO T,LIPTON Z,TSCHANNEN M,et al.Born-Again Neural Networks[C]//International Conference on Machine Learning.2018:1602-1611.
[4]EPETERS M,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018.
[5]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].2018.https://scholar.google.com/scholar?q=Improving+Language+Understanding+by+generative+pre-training&hl=zh-CN&as_sdt=0&as_vis=1&oi=scholart.
[6]LIU Y H,OTT M,GOYAL N,et al.Roberta:A robustly optimized bert pretraining approach[J].arXiv:1907.11692,2019.
[7]LAN Z Z,CHEN M D,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].ar-Xiv:1909.11942,2019.
[8]LIU Y,LAPATA M.Text Summarization with Pretrained Encoders[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3721-3731.
[9]ZHANG X X,WEI F R,ZHOU M.HIBERT:Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics(ACL 2019).Florence,Italy,2019:5059-5069.
[10]DONG L,YANG N,WANG W H,et al.Unified Language Mo-del Pretraining for Natural Language Understanding and Generation[J].arXiv:1905.03197,2019.
[11]NALLAPATI R,ZHAI F F,ZHOU W.Summarunner:A recurrent neural network based sequence model for extractive summarization of documents[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[12]NARAYAN S,COHEN S B,LAPATA M.Ranking Sentencesfor Extractive Summarization with Reinforcement Learning[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(Long Papers).2018:1747-1759.
[13]ZHOU Q Y,YANG N,WEI F R,et al.Neural Document Summarization by Jointly Learning to Score and Select Sentences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).2018:654-663.
[14]CAO Z Q,LI W J,LI S J,et al.AttSum:Joint Learning of Focusing and Summarization with Neural Attention[C]//The 26th International Conference on Computational Linguistics(COLING 2016).2016:547-556.
[15]REN P J,CHEN Z M,REN Z C,et al.Leveraging contextualsentence relations for extractive summarization using a neural attention model[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2017:95-104.
[16]HINTON G,VINYALS O,DEAN J.Distilling theknowledge in a neural network[J].arXiv:1503.02531,2015.
[17]ZHANG Y,XIANG T,HOSPEDALES T M,et al.Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4320-4328.
[18]PHUONG M,LAMPERT C.Towards Understanding Knowl-edge Distillation[C]//International Conference on Machine Learning.2019:5142-5151.
[19]ROMERO A,BALLAS N,KAHOU S E,et al.Fitnets:Hintsfor thin deep nets[J].arXiv:1412.6550,2014.
[20]POLINO A,PASCANU R,ALISTARH D.Model compression via distillation and quantization[J].arXiv:1802.05668,2018.
[21]CLARK K,LUONG M T,KHANDELWAL U,et al.BAM! Born-Again Multi-Task Networks for Natural Language Understanding[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5931-5937.
[22]HERMANN K M,KOCISKY T,GREFENSTETTE E,et al.Teaching machines to read and comprehend[C]//Advances in Neural Information Processing Systems.2015:1693-1701.
[23]NARAYAN S,CARDENAS R,PAPASARANTOPOULOS N,et al.Document modeling with external attention for sentence extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).2018:2020-2030.
[24]MANNING C D,SURDEANU M,BAUER J,et al.The Stanford CoreNLP Natural Language Processing Toolkit[C]//Association for Computational Linguistics(ACL) System Demonstrations.2014:55-60.http://www.aclweb.org/anthology/P/P14/P14-5010.
[25]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out.2004:74-81.
[26]MIHALCEA R,TARAU P.TextRank:Bringing Order intoText[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2004:404-411.
[27]ZHANG X X,LAPATA M,WEI F R,et al.Neural Latent Extractive Document Summarization[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:779-784.
[28]DONG Y,SHEN Y K,CRAWFORD E,et al.BanditSum:Ex-tractive Summarization as a Contextual Bandit[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3739-3748.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!