Computer Science ›› 2023, Vol. 50 ›› Issue (6A): 210300179-7.doi: 10.11896/jsjkx.210300179

• Artificial Intelligence • Previous Articles     Next Articles

Extractive Automatic Summarization Model Based on Knowledge Distillation

ZHAO Jiangjiang1, WANG Yang2, XU Yingying1, GAO Yang2   

  1. 1 China Mobile Online Services Company Limited,Beijing 100033,China;
    2 School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZHAO Jiangjiang,born in 1987,Ph.D candidate,is a member of China Computer Federation.His main research interests include open domian dialog system,information extraction and network representation learning and natural language processing. GAO Yang,born in 1987,Ph.D,asso-ciate professor,is a member of China Computer Federation.Her main research interests include information extraction,network representation lear-ning and natural language processing.

Abstract: The objective of the extractive summarization is to extract the important sentences from the original text to form a short summary while retaining the important content of the original text.Query-focused extractive summary model can further satisfy users’ different needs for summary content.Extractive summary model has the natural advantage of ensuring the correctness of summary and the readability of sentences.On this basis,ensuring the relevance and importance of the summary content is the key to the goal of the model.In order to satisfy the relevance of query and ensure the importance of summary content,this paper uses query information as a model study target,creates an extended summary data set based on the title and picture information,an extractive summary model based on knowledge distillation is proposed.In experiments,the pre-training language model BERT is adopted as the encoder,two model training strategies based on knowledge distillation theory are proposed:guided trai-ning and distillation training.Experimental results on CNN/DailyMail,a publicly available data set of news summaries,show that both training methods have achieved significant effects.It is also found that the model based on guiding training could effectively improve the significance of the summary content,while the model based on distillation training achieves the best effect in improving the relevance and significance of the summary.

Key words: Query-focused extractive summarization, Extended data set, BERT, Knowledge distillation

CLC Number: 

  • TP391
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186.
[3]FURLANELLO T,LIPTON Z,TSCHANNEN M,et al.Born-Again Neural Networks[C]//International Conference on Machine Learning.2018:1602-1611.
[4]EPETERS M,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365,2018.
[5]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].2018.https://scholar.google.com/scholar?q=Improving+Language+Understanding+by+generative+pre-training&hl=zh-CN&as_sdt=0&as_vis=1&oi=scholart.
[6]LIU Y H,OTT M,GOYAL N,et al.Roberta:A robustly optimized bert pretraining approach[J].arXiv:1907.11692,2019.
[7]LAN Z Z,CHEN M D,GOODMAN S,et al.Albert:A lite bert for self-supervised learning of language representations[J].ar-Xiv:1909.11942,2019.
[8]LIU Y,LAPATA M.Text Summarization with Pretrained Encoders[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3721-3731.
[9]ZHANG X X,WEI F R,ZHOU M.HIBERT:Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics(ACL 2019).Florence,Italy,2019:5059-5069.
[10]DONG L,YANG N,WANG W H,et al.Unified Language Mo-del Pretraining for Natural Language Understanding and Generation[J].arXiv:1905.03197,2019.
[11]NALLAPATI R,ZHAI F F,ZHOU W.Summarunner:A recurrent neural network based sequence model for extractive summarization of documents[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[12]NARAYAN S,COHEN S B,LAPATA M.Ranking Sentencesfor Extractive Summarization with Reinforcement Learning[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies(Long Papers).2018:1747-1759.
[13]ZHOU Q Y,YANG N,WEI F R,et al.Neural Document Summarization by Jointly Learning to Score and Select Sentences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).2018:654-663.
[14]CAO Z Q,LI W J,LI S J,et al.AttSum:Joint Learning of Focusing and Summarization with Neural Attention[C]//The 26th International Conference on Computational Linguistics(COLING 2016).2016:547-556.
[15]REN P J,CHEN Z M,REN Z C,et al.Leveraging contextualsentence relations for extractive summarization using a neural attention model[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2017:95-104.
[16]HINTON G,VINYALS O,DEAN J.Distilling theknowledge in a neural network[J].arXiv:1503.02531,2015.
[17]ZHANG Y,XIANG T,HOSPEDALES T M,et al.Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4320-4328.
[18]PHUONG M,LAMPERT C.Towards Understanding Knowl-edge Distillation[C]//International Conference on Machine Learning.2019:5142-5151.
[19]ROMERO A,BALLAS N,KAHOU S E,et al.Fitnets:Hintsfor thin deep nets[J].arXiv:1412.6550,2014.
[20]POLINO A,PASCANU R,ALISTARH D.Model compression via distillation and quantization[J].arXiv:1802.05668,2018.
[21]CLARK K,LUONG M T,KHANDELWAL U,et al.BAM! Born-Again Multi-Task Networks for Natural Language Understanding[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5931-5937.
[22]HERMANN K M,KOCISKY T,GREFENSTETTE E,et al.Teaching machines to read and comprehend[C]//Advances in Neural Information Processing Systems.2015:1693-1701.
[23]NARAYAN S,CARDENAS R,PAPASARANTOPOULOS N,et al.Document modeling with external attention for sentence extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).2018:2020-2030.
[24]MANNING C D,SURDEANU M,BAUER J,et al.The Stanford CoreNLP Natural Language Processing Toolkit[C]//Association for Computational Linguistics(ACL) System Demonstrations.2014:55-60.http://www.aclweb.org/anthology/P/P14/P14-5010.
[25]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out.2004:74-81.
[26]MIHALCEA R,TARAU P.TextRank:Bringing Order intoText[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2004:404-411.
[27]ZHANG X X,LAPATA M,WEI F R,et al.Neural Latent Extractive Document Summarization[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:779-784.
[28]DONG Y,SHEN Y K,CRAWFORD E,et al.BanditSum:Ex-tractive Summarization as a Contextual Bandit[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3739-3748.
[1] ZHAO Ran, YUAN Jiabin, FAN Lili. Medical Ultrasound Image Super-resolution Reconstruction Based on Video Multi-frame Fusion [J]. Computer Science, 2023, 50(7): 143-151.
[2] GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin. Text Classification Method Based on Anti-noise and Double Distillation Technology [J]. Computer Science, 2023, 50(6): 251-260.
[3] LI Binghui, FANG Huan, MEI Zhenhui. Interpretable Repair Method for Event Logs Based on BERT and Weak Behavioral Profiles [J]. Computer Science, 2023, 50(5): 38-51.
[4] LUO Liang, CHENG Chunling, LIU Qian, GUI Yaocheng. Answer Selection Model Based on MLP and Semantic Matrix [J]. Computer Science, 2023, 50(5): 270-276.
[5] WANG Yali, ZHANG Fan, YU Zeng, LI Tianrui. Aspect-level Sentiment Classification Based on Interactive Attention and Graph Convolutional Network [J]. Computer Science, 2023, 50(4): 196-203.
[6] DENG Liang, QI Panhu, LIU Zhenlong, LI Jingxin, TANG Jiqiang. BGPNRE:A BERT-based Global Pointer Network for Named Entity-Relation Joint Extraction Method [J]. Computer Science, 2023, 50(3): 42-48.
[7] LIU Zhe, YIN Chengfeng, LI Tianrui. Chinese Spelling Check Based on BERT and Multi-feature Fusion Embedding [J]. Computer Science, 2023, 50(3): 282-290.
[8] YU Jia-qi, KANG Xiao-dong, BAI Cheng-cheng, LIU Han-qing. New Text Retrieval Model of Chinese Electronic Medical Records [J]. Computer Science, 2022, 49(6A): 32-38.
[9] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[10] YU Ben-gong, ZHANG Zi-wei, WANG Hui-ling. TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information [J]. Computer Science, 2022, 49(6A): 165-171.
[11] CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[12] CHENG Xiang-ming, DENG Chun-hua. Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation [J]. Computer Science, 2022, 49(6): 245-253.
[13] GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement [J]. Computer Science, 2022, 49(6): 313-318.
[14] WEI Ru-ming, CHEN Ruo-yu, LI Han, LIU Xu-hong. Analysis of Technology Trends Based on Deep Learning and Text Measurement [J]. Computer Science, 2022, 49(11A): 211100119-6.
[15] CHEN Zi-zhuo, LIN Xi, WANG Zhong-qing. Stance Detection Based on Argument Boundary Recognition [J]. Computer Science, 2022, 49(11A): 210800180-5.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!