计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 215-223.doi: 10.11896/jsjkx.241000136
李彤亮1, 李奇峰1, 侯霞1, 陈小明2, 李舟军3
LI Tongliang1, LI Qifeng1, HOU Xia1, CHEN Xiaoming2, LI Zhoujun3
摘要: 对话主题分割(DTS)任务旨在将一段多轮对话自动划分为不同的主题片段,从而更精准地理解和处理对话内容,在对话建模任务中具有重要作用。传统的DTS方法主要依赖语义相似性和对话连贯性来进行无监督的对话主题划分,但这些特征难以全面捕捉对话中的复杂主题转换,且未标注的对话数据尚未被充分挖掘和利用。为此,最新的DTS方法通过相邻话语匹配和伪分割,从对话数据中学习主题感知的对话表示,进一步挖掘未标注对话中的有用线索。然而,多轮对话中常见的共指和省略现象可能影响语义相似性的计算,进而削弱相邻话语匹配的准确性。为解决这一问题并充分利用对话关系中的有用线索,提出了一种新颖的无监督对话主题分割方法,结合了话语重写(UR)技术与无监督学习算法。该方法通过重写对话中的共指和省略信息,使其恢复为完整表达,从而更好地捕捉对话中的主题线索。实验结果表明,提出的话语重写主题分割模型(UR-DTS)在主题分割的准确性上取得了显著提升,达到了目前的最好水平。在DialSeg711数据集上,错误分数Pk和WinDiff(WD)两个指标的性能表现均提升了约6个百分点,分别达到11.42%和12.97%。在更复杂的Doc2Dial数据集上,Pk和WD的性能表现分别提升了3个百分点和2个百分点,达到了35.17%和38.49%。这些结果表明,UR-DTS在捕捉对话主题转换方面具有显著优势,且对未标注对话数据有更大的利用潜力。
中图分类号:
| [1]HEARST M A.Multi-paragraph segmentation of expositorytext[C]//Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics.1994:9-16. [2]LI J,MONROE W,RITTER A,et al.Deep ReinforcementLearning for Dialogue Generation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Proces-sing.2016:1192-1202. [3]BOKAEI M H,SAMETI H,LIU Y.Extractive summarization of multi-party meetings through discourse segmentation[J].Natural Language Engineering,2016,22(1):41-72. [4]XU Y,ZHAO H,ZHANG Z.Topic-aware multi-turn dialogue modeling[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:14176-14184. [5]DAI Y,HE W,LI B,et al.CGoDial:A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:4097-4111. [6]LI S,ZHUANG S,SONG W,et al.Sequential texts driven cohesive motions synthesis with natural transitions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:9498-9508. [7]SONG W,JIN X,LI S,et al.FineStyle:Semantic-Aware Fine-Grained Motion Style Transfer with Dual Interactive-Flow Fusion[J].IEEE Transactions on Visualization and Computer Graphics,2023,29(11):4361-4371. [8]SONG W,ZHANG X,GUO Y,et al.Automatic generation of 3d scene animation based on dynamic knowledge graphs and contextual encoding[J].International Journal of Computer Vision,2023,131(11):2816-2844. [9]GAO H,WANG R,LIN T E,et al.Unsupervised dialogue topic segmentation with topic-aware utterance representation[J].ar-Xiv:2305.02747,2023. [10]GALLEY M,MCKEOWN K,FOSLER-LUSSIER E,et al.Discourse segmentation of multi-party conversation[C]//Procee-dings of the 41st Annual Meeting of the Association for Computational Linguistics.2003:562-569. [11]RIEDL M,BIEMANN C.TOPICTILING:a text segmentation algorithm based on LDA[C]//Proceedings of ACL 2012 Student Research Workshop.2012:37-42. [12]KOSHOREK O,COHEN A,MOR N,et al.Text Segmentation as a Supervised Learning Task[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:469-473. [13]LO K,JIN Y,TAN W,et al.Transformer over pre-trainedtransformer for neural text segmentation with enhanced topic coherence[C]//Empirical Methods in Natural Language Processing 2021.ACL,2021:3334-3340. [14]HEARST M A.Text tiling:Segmenting text into multi-paragraph subtopic passages[J].Computational Linguistics,1997,23(1):33-64. [15]GLAVAŠ G,NANNI F,PONZETTO S P.Unsupervised textsegmentation using semantic relatedness graphs[C]//Procee-dings of the Fifth Joint Conference on Lexical and Computatio-nal Semantics.ACL,2016:125-130. [16]SONG Y,MOU L,YAN R,et al.Dialogue Session Segmentation by Embedding-Enhanced TextTiling[C]//Conference of the International Speech Communication Association(INTERSPEECH 2016).2016:2706-2710. [17]HE W,DAI Y,HUI B,et al.SPACE-2:Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding[C]//Proceedings of the 29th International Conference on Computational Linguistics.2022:553-569. [18]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186. [19]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Embeddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3982-3992. [20]XING L,CARENINI G.Improving Unsupervised Dialogue To-pic Segmentation with Utterance-Pair Coherence Scoring[C]//Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.2021:167-177. [21]DZIRI N,KAMALLOO E,MATHEWSON K,et al.Evaluating Coherence in Dialogue Systems using Entailment[C]//Procee-dings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:3806-3812. [22]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3982-3992. [23]LI Y,SU H,SHEN X,et al.DailyDialog:A Manually Labelled Multi-turn Dialogue Dataset[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing.2017:986-995. [24]PU H,WANG L.Dialogue Segmentation based on DynamicContext Coherence[C]//Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval.2023:190-195. [25]SEE A,MANNING C D.Understanding and predicting user dissatisfaction in a neural generative chatbot[C]//Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.2021:1-12. [26]FANG Y,ZHANG H,CHEN H,et al.From spoken dialogue to formal summary:An utterance rewriting for dialogue summarization[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2022:3859-3869. [27]MELE I,MUNTEAN C I,NARDINI F M,et al.Adaptive utterance rewriting for conversational search[J].Information Processing & Management,2021,58(6):102682. [28]JIANG W,GU X,CHEN Y,et al.DuReSE:Rewriting Incomplete Utterances via Neural Sequence Editing[J].Neural Processing Letters,2023,55(7):8713-8730. [29]NIEHUES J,CHO E,HA T L,et al.Pre-Translation for Neural Machine Translation[C]//Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics:Technical Papers.2016:1828-1836. [30]JUNCZYS-DOWMUNT M,GRUNDKIEWICZ R.An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing.2017:120-129. [31]CHEN Y C,BANSAL M.Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:675-686. [32]WESTON J,DINAN E,MILLER A.Retrieve and Refine:Im-proved Sequence Generation Models For Dialogue[C]//Procee-dings of the 2018 EMNLP Workshop SCAI:The 2nd Internatio-nal Workshop on Search-Oriented Conversational AI.2018:87-92. [33]RASTOGI P,GUPTA A,CHEN T,et al.Scaling Multi-Domain Dialogue State Tracking via Query Reformulation[C]//Procee-dings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:97-105. [34]RIEZLER S,LIU Y.Query rewriting using monolingual statistical machine translation[J].Computational Linguistics,2010,36(3):569-582. [35]CHEN B,SUN L,HAN X P,et al.Sentence Rewriting for Semantic Parsing[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:766-777. [36]ABUJABAL A,SAHA ROY R,YAHYA M,et al.Never-ending learning for open-domain question answering over knowledge bases[C]//Proceedings of the 2018 World Wide Web Confe-rence.2018:1053-1062. [37]JIN D,LIU S,LIU Y,et al.Improving Bot Response Contradiction Detection via Utterance Rewriting[C]//23rd Annual Mee-ting of the Special Interest Group on Discourse and Dialogue(SIGDIAL 2022).ACL,2022:605-614. [38]GAO T,YAO X,CHEN D.SimCSE:Simple Contrastive Learning of Sentence Embeddings[C]//2021 Conference on Empirical Methods in Natural Language Processing(EMNLP 2021).2021. [39]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].Journal of Machine Learning Research,2020,21(140):1-67. [40]ZHANG J,ZHAO Y,SALEH M,et al.Pegasus:Pre-trainingwith extracted gap-sentences for abstractive summarization[C]//International Conference on Machine Learning.PMLR,2020:11328-11339. [41]SMITH E M,WILLIAMSON M,SHUSTER K,et al.Can You Put it All Together:Evaluating Conversational Agents’ Ability to Blend Skills[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2021-2030. [42]FENG S,WAN H,GUNASEKARA C,et al.doc2dial:A Goal-Oriented Document-Grounded Dialogue Dataset[C]//Procee-dings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:8118-8128. [43]BUDZIANOWSKI P,WEN T H,TSENG B H,et al.Multi-WOZ-A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:5016-5026. [44]ERIC M,KRISHNAN L,CHARETTE F,et al.Key-Value Retrieval Networks for Task-Oriented Dialogue[C]//Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue.2017:37-49. [45]BEEFERMAN D,BERGER A,LAFFERTY J.Statistical mo-dels for text segmentation[J].Machine Learning,1999,34:177-210. [46]PEVZNER L,HEARST M A.A critique and improvement of an evaluation metric for text segmentation[J].Computational Linguistics,2002,28(1):19-36. [47]EISENSTEIN J,BARZILAY R.Bayesian unsupervised topicsegmentation[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing.2008:334-343. |
|
||