计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 266-274.doi: 10.11896/jsjkx.250100023
赵金爽, 黄德根
ZHAO Jinshuang, HUANG Degen
摘要: 文本摘要的忠实度,即其与原文在事实层面的一致性,对于自动文本摘要的实际应用具有重要意义。现有的摘要忠实度评估方法在利用文本摘要数据集方面存在不足,且构建的不忠实摘要与原文差异显著,这限制了评估方法的有效性。针对此问题,提出一种基于数据增强和两阶段训练的摘要忠实度评估模型——FaithEval。首先,定义两种数据增强方法,即同主题相似检索和外插掩码填充,用于生成与原文内容相关联的不忠实摘要,应用这些方法从文本摘要数据集中提取训练数据;然后,充分利用数据集的信息,基于原文和参考摘要构建的训练数据,分两个阶段对模型进行训练,逐步强化模型的忠实度评估能力;最后,人工构建摘要忠实度评估测试集SFETS,为检验模型性能提供基准。实验结果表明,在SFETS和Rank19数据集上,FaithEval均表现出色,尤其在SFETS数据集上,达到了当前最优的效果。
中图分类号:
[1]KRYSCINSKI W,KESKAR N S,MCCANN B,et al.NeuralText Summarization:A Critical Evaluation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.2019:540-551. [2]WU S X,HUANG D G,LI J Y.Abstractive Text Summarization Based on Semantic Alignment Network[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2021,57(1):1-6. [3]CHEANG C,CHAN H,WONG D,et al.TempoSum:Evaluating the Temporal Generalization of Abstractive Summarization[J].arXiv:2305.01951v1,2023. [4]SUN K L,LUO X D,LUO Y R.Survey of Applications of Pretrained Language Models[J].Computer Science,2023,50(1):176-184. [5]CAO Z,WEI F,LI W,et al.Faithful to the Original:Fact AwareNeural Abstractive Summarization[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.AAAI,2018:4784-4791. [6]PAGNONI A,BALACHANDRAN V,TSVETKOV Y.Under-standing Factuality in Abstractive Summarization with FRANK:A Benchmark for Factuality Metrics[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2021:4812-4829. [7]KRYSCINSKI W,MCCANN B,XIONG C,et al.Evaluating the Factual Consistency of Abstractive Text Summarization[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.ACL,2020:9332-9346. [8]CAO M,DONG Y,WU J,et al.Factual Error Correction forAbstractive Summarization Models[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.ACL,2020:6251-6258. [9]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186. [10]LEE H,YOO K M,PARK J,et al.Masked Summarization toGenerate Factually Inconsistent Summaries for Improved Factual Consistency Checking[C]//Proceedings of the Findings of the Association for Computational Linguistics.ACL,2022:1019-1030. [11]FALKE T,RIBEIRO L F R,UTAMA P A,et al.RankingGenerated Summaries by Correctness:An Interesting but Challenging Application for Natural Language Inference[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.ACL,2020:2214-2220. [12]HUANG Y,FENG X,FENG X,et al.The Factual Inconsistency Problem in Abstractive Text Summarization:A Survey[J].ar-Xiv:2104.14839,2021. [13]LUO Z,XIE Q,ANANIADOU S.ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization[J].ar-Xiv:2303.15621v1,2023. [14]GOODRICH B,RAO V,LIU P J,et al.Assessing The Factual Accuracy of Generated Text[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2019:166-175. [15]SCIALOM T,DRAY P A,GALLINARI P,et al.QuestEval:Summarization Asks for Fact-based Evaluation[C]//Procee-dings of the 2021 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2021:6594-6604. [16]DURMUS E,HE H,DIAB M.FEQA:A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics.ACL,2020:5055-5070. [17]LIN C.ROUGE:A Package for Automatic Evaluation of Summaries[C]//Proceedings of the Meeting of the Association for Computational Linguistics.2004:74-81. [18]ZHANG T,KISHORE V,WU F,et al.BERTScore:Evaluating Text Generation with BERT[C]//Proceedings of the 8th International Conference on Learning Representations.2020. [19]KOCMI T,FEDERMANN C.Large Language Models AreState-of-the-Art Evaluators of Translation Quality[C]//Proceedings of the 24th Annual Conference of the European Association for Machine Translation.Tampere,Finland:European Association for Machine Translation,2023:193-203. [20]WANG J,LIANG Y,MENG F,et al.Is ChatGPT a Good NLG Evaluator? A Preliminary Study[C]//Proceedings of the 4th New Frontiers in Summarization Workshop.ACL,2023:1-11. [21]LIU Y,ITER D,XU Y,et al.G-EVAL:NLG Evaluation using GPT-4 with Better Human Alignment[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.ACL,2023:2511-2522. [22]OPENAI.GPT-4 Technical Report[J].arXiv:2303.08774,2023. [23]WANG P,LI L,CHEN L,et al.Large Language Models are not Fair Evaluators[J].arXiv:2305.17926v2,2023. [24]GEKHMAN Z,HERZIG J,AHARONI R,et al.TrueTeacher:Learning Factual Consistency Evaluation with Large Language Models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.ACL,2023:2053-2070. [25]BLEI D M,NG A Y,JORDAN M T.Latent Dirichlet Allocation[C]//Proceedings of the 15th Annual Neural Information Processing Systems Conference.Vancouver,BC,Neural Information Processing Systems Foundation,2002:601-608. [26]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics.ACL,2020:7871-7880. [27]HU B,CHEN Q,ZHU F.LCSTS:A Large Scale Chinese Short Text Summarization Dataset[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.ACL,2015:1967-1972. [28]HERMANN K M,KOISKY T,GREFENSTETTE E,et al.Teaching Machines to Read and Comprehend[C]//Proceedings of the 29th Annual Conference on Neural Information Processing Systems.Montreal,QC:Neural Information Processing Systems Foundation,2015:1693-1701. [29]TOUVRON H,LAVRIL T,IZACARD G,et al.LLaMA:Open and Efficient Foundation Language Models[J].arXiv:2302.13971,2023. [30]GLM T,ZENG A,XU B,et al.ChatGLM:A Family of Large Language Models from GLM-130B to GLM-4 All Tools[J].arXiv:2406.12793,2024. [31]CHUNG H W,HOU L,LONGPRE S,et al.Scaling Instruction-Finetuned Language Models[J].arXiv:2210.11416,2022. |
|