计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 210300179-7.doi: 10.11896/jsjkx.210300179
赵江江1, 王洋2, 许楹楹1, 高扬2
ZHAO Jiangjiang1, WANG Yang2, XU Yingying1, GAO Yang2
摘要: 抽取式自动摘要任务的目标是通过抽取原文中重要的句子来构成简短的摘要,同时保留原文中重要的内容。查询导向的抽取式摘要模型则可以进一步满足用户对摘要内容的不同需求。抽取式摘要模型具有能保证摘要内容正确性和句子可读性的天然优势,在此基础上确保摘要内容的相关性和显著性则成为了模型摘要目标的关键。为了实现抽取式摘要模型既满足查询的相关性又能保证摘要内容的显著性的目的,将查询信息作为模型学习的目标,利用摘要数据集的标题和图片信息额外构建了基于查询的扩展摘要数据集,并结合知识蒸馏方法提出了基于知识蒸馏的抽取式摘要模型。在实验中采用预训练语言模型BERT作为编码器,并结合知识蒸馏理论提出了两种模型训练策略:引导训练和蒸馏训练。在公开的新闻摘要数据集CNN/DailyMail上的实验结果证明,两种训练方法都取得了显著的效果。通过实验还发现,基于引导训练的摘要模型可以有效提高摘要内容的显著性,同时基于蒸馏训练的模型在提高摘要相关性和显著性方面达到了最好的效果。
中图分类号:
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186. [3]FURLANELLO T,LIPTON Z,TSCHANNEN M,et al.Born-Again Neural Networks[C]//International Conference on Machine Learning.2018:1602-1611. [4]EPETERS M,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018. [5]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].2018.https://scholar.google.com/scholar?q=Improving+Language+Understanding+by+generative+pre-training&hl=zh-CN&as_sdt=0&as_vis=1&oi=scholart. [6]LIU Y H,OTT M,GOYAL N,et al.Roberta:A robustly optimized bert pretraining approach[J].arXiv:1907.11692,2019. [7]LAN Z Z,CHEN M D,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].ar-Xiv:1909.11942,2019. [8]LIU Y,LAPATA M.Text Summarization with Pretrained Encoders[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3721-3731. [9]ZHANG X X,WEI F R,ZHOU M.HIBERT:Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics(ACL 2019).Florence,Italy,2019:5059-5069. [10]DONG L,YANG N,WANG W H,et al.Unified Language Mo-del Pretraining for Natural Language Understanding and Generation[J].arXiv:1905.03197,2019. [11]NALLAPATI R,ZHAI F F,ZHOU W.Summarunner:A recurrent neural network based sequence model for extractive summarization of documents[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017. [12]NARAYAN S,COHEN S B,LAPATA M.Ranking Sentencesfor Extractive Summarization with Reinforcement Learning[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(Long Papers).2018:1747-1759. [13]ZHOU Q Y,YANG N,WEI F R,et al.Neural Document Summarization by Jointly Learning to Score and Select Sentences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).2018:654-663. [14]CAO Z Q,LI W J,LI S J,et al.AttSum:Joint Learning of Focusing and Summarization with Neural Attention[C]//The 26th International Conference on Computational Linguistics(COLING 2016).2016:547-556. [15]REN P J,CHEN Z M,REN Z C,et al.Leveraging contextualsentence relations for extractive summarization using a neural attention model[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2017:95-104. [16]HINTON G,VINYALS O,DEAN J.Distilling theknowledge in a neural network[J].arXiv:1503.02531,2015. [17]ZHANG Y,XIANG T,HOSPEDALES T M,et al.Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4320-4328. [18]PHUONG M,LAMPERT C.Towards Understanding Knowl-edge Distillation[C]//International Conference on Machine Learning.2019:5142-5151. [19]ROMERO A,BALLAS N,KAHOU S E,et al.Fitnets:Hintsfor thin deep nets[J].arXiv:1412.6550,2014. [20]POLINO A,PASCANU R,ALISTARH D.Model compression via distillation and quantization[J].arXiv:1802.05668,2018. [21]CLARK K,LUONG M T,KHANDELWAL U,et al.BAM! Born-Again Multi-Task Networks for Natural Language Understanding[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5931-5937. [22]HERMANN K M,KOCISKY T,GREFENSTETTE E,et al.Teaching machines to read and comprehend[C]//Advances in Neural Information Processing Systems.2015:1693-1701. [23]NARAYAN S,CARDENAS R,PAPASARANTOPOULOS N,et al.Document modeling with external attention for sentence extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).2018:2020-2030. [24]MANNING C D,SURDEANU M,BAUER J,et al.The Stanford CoreNLP Natural Language Processing Toolkit[C]//Association for Computational Linguistics(ACL) System Demonstrations.2014:55-60.http://www.aclweb.org/anthology/P/P14/P14-5010. [25]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out.2004:74-81. [26]MIHALCEA R,TARAU P.TextRank:Bringing Order intoText[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2004:404-411. [27]ZHANG X X,LAPATA M,WEI F R,et al.Neural Latent Extractive Document Summarization[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:779-784. [28]DONG Y,SHEN Y K,CRAWFORD E,et al.BanditSum:Ex-tractive Summarization as a Contextual Bandit[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3739-3748. |
|