Computer Science ›› 2025, Vol. 52 ›› Issue (12): 215-223.doi: 10.11896/jsjkx.241000136

• Artificial Intelligence • Previous Articles     Next Articles

Unsupervised Dialogue Topic Segmentation Method Based on Utterance Rewriting

LI Tongliang1, LI Qifeng1, HOU Xia1, CHEN Xiaoming2, LI Zhoujun3   

  1. 1 School of Computer Science, Beijing Information Science & Technology University, Beijing 102206, China
    2 Shenzhen Intelligent Strong Technology Co., Ltd., Shenzhen, Guangdong 518052, China
    3 School of Computer Science and Engineering, Beihang University, Beijing 100191, China
  • Received:2024-10-18 Revised:2025-01-08 Online:2025-12-15 Published:2025-12-09
  • About author:LI Tongliang,born in 1992,Ph.D,lecturer.His main research interests include artificial intelligence,natural language processing and large language model.
    CHEN Xiaoming,born in 1980,master,engineer.His main research interests include artificial intelligence and document intelligent processing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(62406033,62276017,U1636211,61672081) and University-Industry Collaborative Education Program(231004723052336).

Abstract: Dialogue Topic Segmentation(DTS) task aims to automatically divide a multi-turn conversation into different topic segments,enabling more precise understanding and processing of dialogue content.DTS plays an important role in dialogue modeling tasks.Traditional DTS methods primarily rely on semantic similarity and dialogue coherence to perform unsupervised topic segmentation,but these features are often insufficient to fully capture complex topic transitions in conversations,and unannotated dia-logue data has not been fully explored and utilized.To address this issue,recent DTS methods employ adjacent utterance ma-tching and pseudo-segmentation to learn topic-aware representations from dialogue data,further extracting useful cues from unannotated dialogues.However,common phenomena such as coreference and ellipsis in multi-turn dialogues may affect the calculation of semantic similarity,thereby weakening the accuracy of adjacent utterance matching.To solve this problem and fully leverage the useful cues in dialogue relationships,this study proposes a novel unsupervised DTS method that combines utterance rewriting(UR) techniques with unsupervised learning algorithms.This approach rewrites coreferential and elliptical expressions in the dialogue to restore them to their complete forms,better capturing the thematic cues in the conversation.Experimental results show that the proposed utterance rewriting topic segmentation model(UR-DTS) significantly improves topic segmentation accuracy,achieving state-of-the-art performance.On the DialSeg711 dataset,the error rate Pk and WinDiff(WD) improves by approximately 6 percentage point,reaching 11.42% and 12.97%,respectively.On the more complex Doc2Dial dataset,Pk and WD improve by 3 percentage point and 2 percentage point,reaching 35.17% and 38.49%.These results demonstrate that UR-DTS has a significant advantage in capturing topic transitions in conversations and shows greater potential for leveraging unannotated dialogue data.

Key words: Multi-turn dialogue, Unsupervised learning, Natural language understanding, Doc2Dial

CLC Number: 

  • TP391
[1]HEARST M A.Multi-paragraph segmentation of expositorytext[C]//Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics.1994:9-16.
[2]LI J,MONROE W,RITTER A,et al.Deep ReinforcementLearning for Dialogue Generation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Proces-sing.2016:1192-1202.
[3]BOKAEI M H,SAMETI H,LIU Y.Extractive summarization of multi-party meetings through discourse segmentation[J].Natural Language Engineering,2016,22(1):41-72.
[4]XU Y,ZHAO H,ZHANG Z.Topic-aware multi-turn dialogue modeling[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:14176-14184.
[5]DAI Y,HE W,LI B,et al.CGoDial:A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:4097-4111.
[6]LI S,ZHUANG S,SONG W,et al.Sequential texts driven cohesive motions synthesis with natural transitions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:9498-9508.
[7]SONG W,JIN X,LI S,et al.FineStyle:Semantic-Aware Fine-Grained Motion Style Transfer with Dual Interactive-Flow Fusion[J].IEEE Transactions on Visualization and Computer Graphics,2023,29(11):4361-4371.
[8]SONG W,ZHANG X,GUO Y,et al.Automatic generation of 3d scene animation based on dynamic knowledge graphs and contextual encoding[J].International Journal of Computer Vision,2023,131(11):2816-2844.
[9]GAO H,WANG R,LIN T E,et al.Unsupervised dialogue topic segmentation with topic-aware utterance representation[J].ar-Xiv:2305.02747,2023.
[10]GALLEY M,MCKEOWN K,FOSLER-LUSSIER E,et al.Discourse segmentation of multi-party conversation[C]//Procee-dings of the 41st Annual Meeting of the Association for Computational Linguistics.2003:562-569.
[11]RIEDL M,BIEMANN C.TOPICTILING:a text segmentation algorithm based on LDA[C]//Proceedings of ACL 2012 Student Research Workshop.2012:37-42.
[12]KOSHOREK O,COHEN A,MOR N,et al.Text Segmentation as a Supervised Learning Task[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:469-473.
[13]LO K,JIN Y,TAN W,et al.Transformer over pre-trainedtransformer for neural text segmentation with enhanced topic coherence[C]//Empirical Methods in Natural Language Processing 2021.ACL,2021:3334-3340.
[14]HEARST M A.Text tiling:Segmenting text into multi-paragraph subtopic passages[J].Computational Linguistics,1997,23(1):33-64.
[15]GLAVAŠ G,NANNI F,PONZETTO S P.Unsupervised textsegmentation using semantic relatedness graphs[C]//Procee-dings of the Fifth Joint Conference on Lexical and Computatio-nal Semantics.ACL,2016:125-130.
[16]SONG Y,MOU L,YAN R,et al.Dialogue Session Segmentation by Embedding-Enhanced TextTiling[C]//Conference of the International Speech Communication Association(INTERSPEECH 2016).2016:2706-2710.
[17]HE W,DAI Y,HUI B,et al.SPACE-2:Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding[C]//Proceedings of the 29th International Conference on Computational Linguistics.2022:553-569.
[18]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186.
[19]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Embeddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3982-3992.
[20]XING L,CARENINI G.Improving Unsupervised Dialogue To-pic Segmentation with Utterance-Pair Coherence Scoring[C]//Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.2021:167-177.
[21]DZIRI N,KAMALLOO E,MATHEWSON K,et al.Evaluating Coherence in Dialogue Systems using Entailment[C]//Procee-dings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:3806-3812.
[22]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3982-3992.
[23]LI Y,SU H,SHEN X,et al.DailyDialog:A Manually Labelled Multi-turn Dialogue Dataset[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing.2017:986-995.
[24]PU H,WANG L.Dialogue Segmentation based on DynamicContext Coherence[C]//Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval.2023:190-195.
[25]SEE A,MANNING C D.Understanding and predicting user dissatisfaction in a neural generative chatbot[C]//Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue.2021:1-12.
[26]FANG Y,ZHANG H,CHEN H,et al.From spoken dialogue to formal summary:An utterance rewriting for dialogue summarization[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2022:3859-3869.
[27]MELE I,MUNTEAN C I,NARDINI F M,et al.Adaptive utterance rewriting for conversational search[J].Information Processing & Management,2021,58(6):102682.
[28]JIANG W,GU X,CHEN Y,et al.DuReSE:Rewriting Incomplete Utterances via Neural Sequence Editing[J].Neural Processing Letters,2023,55(7):8713-8730.
[29]NIEHUES J,CHO E,HA T L,et al.Pre-Translation for Neural Machine Translation[C]//Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics:Technical Papers.2016:1828-1836.
[30]JUNCZYS-DOWMUNT M,GRUNDKIEWICZ R.An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing.2017:120-129.
[31]CHEN Y C,BANSAL M.Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:675-686.
[32]WESTON J,DINAN E,MILLER A.Retrieve and Refine:Im-proved Sequence Generation Models For Dialogue[C]//Procee-dings of the 2018 EMNLP Workshop SCAI:The 2nd Internatio-nal Workshop on Search-Oriented Conversational AI.2018:87-92.
[33]RASTOGI P,GUPTA A,CHEN T,et al.Scaling Multi-Domain Dialogue State Tracking via Query Reformulation[C]//Procee-dings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:97-105.
[34]RIEZLER S,LIU Y.Query rewriting using monolingual statistical machine translation[J].Computational Linguistics,2010,36(3):569-582.
[35]CHEN B,SUN L,HAN X P,et al.Sentence Rewriting for Semantic Parsing[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:766-777.
[36]ABUJABAL A,SAHA ROY R,YAHYA M,et al.Never-ending learning for open-domain question answering over knowledge bases[C]//Proceedings of the 2018 World Wide Web Confe-rence.2018:1053-1062.
[37]JIN D,LIU S,LIU Y,et al.Improving Bot Response Contradiction Detection via Utterance Rewriting[C]//23rd Annual Mee-ting of the Special Interest Group on Discourse and Dialogue(SIGDIAL 2022).ACL,2022:605-614.
[38]GAO T,YAO X,CHEN D.SimCSE:Simple Contrastive Learning of Sentence Embeddings[C]//2021 Conference on Empirical Methods in Natural Language Processing(EMNLP 2021).2021.
[39]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].Journal of Machine Learning Research,2020,21(140):1-67.
[40]ZHANG J,ZHAO Y,SALEH M,et al.Pegasus:Pre-trainingwith extracted gap-sentences for abstractive summarization[C]//International Conference on Machine Learning.PMLR,2020:11328-11339.
[41]SMITH E M,WILLIAMSON M,SHUSTER K,et al.Can You Put it All Together:Evaluating Conversational Agents’ Ability to Blend Skills[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2021-2030.
[42]FENG S,WAN H,GUNASEKARA C,et al.doc2dial:A Goal-Oriented Document-Grounded Dialogue Dataset[C]//Procee-dings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:8118-8128.
[43]BUDZIANOWSKI P,WEN T H,TSENG B H,et al.Multi-WOZ-A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:5016-5026.
[44]ERIC M,KRISHNAN L,CHARETTE F,et al.Key-Value Retrieval Networks for Task-Oriented Dialogue[C]//Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue.2017:37-49.
[45]BEEFERMAN D,BERGER A,LAFFERTY J.Statistical mo-dels for text segmentation[J].Machine Learning,1999,34:177-210.
[46]PEVZNER L,HEARST M A.A critique and improvement of an evaluation metric for text segmentation[J].Computational Linguistics,2002,28(1):19-36.
[47]EISENSTEIN J,BARZILAY R.Bayesian unsupervised topicsegmentation[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing.2008:334-343.
[1] JIANG Rui, FAN Shuwen, WANG Xiaoming, XU Youyun. Clustering Algorithm Based on Improved SOM Model [J]. Computer Science, 2025, 52(8): 162-170.
[2] DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer [J]. Computer Science, 2025, 52(8): 188-194.
[3] AN Rui, LU Jin, YANG Jingjing. Deep Clustering Method Based on Dual-branch Wavelet Convolutional Autoencoder and DataAugmentation [J]. Computer Science, 2025, 52(4): 129-137.
[4] HE Liren, PENG Bo, CHI Mingmin. Unsupervised Multi-class Anomaly Detection Based on Prototype Reverse Distillation [J]. Computer Science, 2025, 52(2): 202-211.
[5] LIN Zukai, HOU Guojia, WANG Guodong, PAN Zhenkuan. Image Deraining Based on Union Attention Mechanism and Multi-stage Feature Extraction [J]. Computer Science, 2025, 52(11): 206-212.
[6] DING Xinyu, KONG Bing, CHEN Hongmei, BAO Chongming, ZHOU Lihua. Path-masked Autoencoder Guiding Unsupervised Attribute Graph Node Clustering [J]. Computer Science, 2025, 52(1): 160-169.
[7] LI Dongyang, NIE Rencan, PAN Linna, LI He. UMGN:An Infrared and Visible Image Fusion Network Based on Unsupervised Significance MaskGuidance [J]. Computer Science, 2024, 51(6A): 230600170-5.
[8] LOU Ren, HE Renqiang, ZHAO Sanyuan, HAO Xin, ZHOU Yueqi, WANG Xinyuan, LI Fangfang. Single Stage Unsupervised Visible-infrared Person Re-identification [J]. Computer Science, 2024, 51(6A): 230600138-7.
[9] HE Yifan, HE Yulin, CUI Laizhong, HUANG Zhexue. Subspace-based I-nice Clustering Algorithm [J]. Computer Science, 2024, 51(6): 153-160.
[10] LIU Jun, RUAN Tong, ZHANG Huanhuan. Prompt Learning-based Generative Approach Towards Medical Dialogue Understanding [J]. Computer Science, 2024, 51(5): 258-266.
[11] CAI Jiacheng, DONG Fangmin, SUN Shuifa, TANG Yongheng. Unsupervised Learning of Monocular Depth Estimation:A Survey [J]. Computer Science, 2024, 51(2): 117-134.
[12] KONG Senlin, ZHANG Hui, HUANG Zhennan, LIU Youwu, TAO Yan. Asymmetric Teacher-Student Network Model for Industrial Image Anomaly Detection [J]. Computer Science, 2024, 51(11A): 240200069-7.
[13] JING Yeyiran, YU Zeng, SHI Yunxiao, LI Tianrui. Review of Unsupervised Domain Adaptive Person Re-identification Based on Pseudo-labels [J]. Computer Science, 2024, 51(1): 72-83.
[14] XU Jie, WANG Lisong. Contrastive Clustering with Consistent Structural Relations [J]. Computer Science, 2023, 50(9): 123-129.
[15] LIANG Yunhui, GAN Jianwen, CHEN Yan, ZHOU Peng, DU Liang. Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking [J]. Computer Science, 2023, 50(7): 72-81.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!