基于预训练模型的无监督剧本摘要

doi:10.11896/jsjkx.211100039

Abstract

Abstract: The script is a special text structure,which is composed of the dialogue between characters and the description of the scene.Unsupervised script summary refers to compressing and extracting a long script to form a short text that can summarize the information of the script.Therefore,this paper proposes an unsupervised script summary method based on a pre-training mo-del.By adding pre-training tasks for text sequence processing in pre-training,the generated pre-training model fully takes into account the description of the dialogue in the script and the emotional characteristics of the characters,then the model is used as a trainer to calculate the similarity between sentences and combined with the TextRank algorithm to score and sort the key sentences.Finally,the sentence with the highest score is selected as the summary.Experimental results show that the proposed method has better performance than the base model,and the performance is significantly improved in the ROUGE evaluation.

Key words: Pre-trained model, Pre-training task, Script summary, Unsupervised, Sentence similarity, Dialogue

CLC Number:

TP391

SU Qi, WANG Hongling, WANG Zhongqing. Unsupervised Script Summarization Based on Pre-trained Model[J].Computer Science, 2023, 50(2): 310-316.

References

[1]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[2]MIHALCEA R,TARAU P.Textrank:Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411.
[3]LIU Y.Fine-tune BERT for extractivesummarization[J].arXiv:1903.10318,2019.
[4]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[5]KANO R,MIURA Y,TANIGUCHIT,et al.Identifying Implicit Quotes for Unsupervised Extractive Summarization of Conversations[C]//Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing.2020:291-302.
[6]PAPALAMPIDI P,KELLER F,FRERMANNL,et al.Screenplay Summarization Using Latent Narrative Structure[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:1920-1933.
[7]ZHOU Q,WEI F,ZHOU M.At Which Level Should We Extract?An Empirical Analysis on Extractive Document Summarization[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:5617-5628.
[8]FENG X,FENG X,QIN L,et al.Language model as an annotator:Exploring dialogpt for dialogue summarization[J].arXiv:2105.12544,2021.
[9]ZOU Y,ZHU B,HU X,et al.Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining[J].ar-Xiv:2109.04080,2021.
[10]CHEN J,YANG D.Simple Conversational Data Augmentationfor Semi-supervised Abstractive Dialogue Summarization[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:6605-6616.
[11]ZHAO L,ZENG W,XU W,et al.Give the Truth:Incorporate Semantic Slot into Abstractive Dialogue Summarization[C]//Findings of the Association for Computational Linguistics:EMNLP.2021:2435-2446.
[12]ZOU Y,ZHAO L,KANG Y,et al.Topic-oriented spoken dialogue summarization for customer service with saliency-aware topic modeling[J].arXiv:2012.07311,2020.
[13]DAI A M,LE Q V.Semi-supervised sequence learning[J].Advances in Neural Information Processing Systems,2015,28:3079-3087.
[14]ZHANG H,CAI J,XU J,et al.Pretraining-Based Natural Language Generation for Text Summarization[C]//Proceedings of the 23rd Conference on Computational Natural Language Lear-ning(CoNLL).2019:789-797.
[15]LIU Y,LAPATA M.Text Summarization with Pretrained En-coders[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3730-3740.
[16]LI R N.Research on Semantic-based Text Similarity Calculation Method[D].Beijing:Beijing University of Technology,2018.
[17]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[18]PAGE L,BRIN S,MOTWANI R,et al.The PageRank citation ranking:Bringing order to the web[R].Stanford InfoLab,1999.
[19]LIN C Y.Rouge:A package for automatic evaluation of summaries[C]//Text Summarization Branches Out.2004:74-81.
[20]DONG L,YANG N,WANG W,et al.Unified language model pre-training for natural language understanding and generation[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:13063-13075.
[21]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems.2014:3104-3112.

Related Articles 15

[1]	WANG Bin, LIANG Yudong, LIU Zhe, ZHANG Chao, LI Deyu. Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment [J]. Computer Science, 2023, 50(1): 123-130.
[2]	SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[3]	LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[4]	HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
[5]	LU Liang, KONG Fang. Dialogue-based Entity Relation Extraction with Knowledge [J]. Computer Science, 2022, 49(5): 200-205.
[6]	XU Hui, WANG Zhong-qing, LI Shou-shan, ZHANG Min. Personalized Dialogue Generation Integrating Sentimental Information [J]. Computer Science, 2022, 49(11A): 211100019-6.
[7]	ZHU Ruo-chen, YANG Chang-chun, ZHANG Deng-hui. EGOS-DST:Efficient Schema-guided Approach to One-step Dialogue State Tracking for Diverse Expressions [J]. Computer Science, 2022, 49(11A): 210900246-7.
[8]	ZHANG Yu-xin, CHEN Yi-qiang. Driver Distraction Detection Based on Multi-scale Feature Fusion Network [J]. Computer Science, 2022, 49(11): 170-178.
[9]	YAO Yi, YANG Fan. Chinese Keyword Extraction Method Combining Knowledge Graph and Pre-training Model [J]. Computer Science, 2022, 49(10): 243-251.
[10]	WANG Kai, LI Zhou-jun, SHENG Wen-bo, CHEN Shu-wei, WANG Ming-xuan, LIU Jian-qing, LAN Hai-bo, ZHANG Rui. Multi-turn Dialogue Technology and Its Application in Power Grid Data Query [J]. Computer Science, 2022, 49(10): 265-271.
[11]	NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278.
[12]	HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[13]	WANG Sheng, ZHANG Yang-sen, CHEN Ruo-yu, XIANG Ga. Text Matching Method Based on Fine-grained Difference Features [J]. Computer Science, 2021, 48(8): 60-65.
[14]	TANG Shi-zheng, ZHANG Yan-feng. DragDL:An Easy-to-Use Graphical DL Model Construction System [J]. Computer Science, 2021, 48(8): 220-225.
[15]	WU Lan, WANG Han, LI Bin-quan. Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks [J]. Computer Science, 2021, 48(6A): 357-363.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Unsupervised Script Summarization Based on Pre-trained Model

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0