计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 331-336.doi: 10.11896/jsjkx.210500028

• 人工智能 • 上一篇    下一篇

面向法律裁判文书的生成式自动摘要模型

周蔚1, 王兆毓1, 魏斌2   

  1. 1 中国政法大学法治信息管理学院 北京102249
    2 浙江大学数字法治研究院 杭州310008
  • 收稿日期:2021-05-06 修回日期:2021-07-15 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 魏斌(srsysj@zju.edu.cn)
  • 作者简介:zhouwei@cupl.edu.cn
  • 基金资助:
    中国政法大学科研创新项目(21ZFQ82005);浙江省重点研发计划(2020C01060);国家重点研发计划(2018YFC0831800);国家社科基金重大项目(20&ZD047);中央高校基本科研业务费专项资金资助

Abstractive Automatic Summarizing Model for Legal Judgment Documents

ZHOU Wei1, WANG Zhao-yu1, WEI Bin2   

  1. 1 School of Information Management for Law,China University of Political Science and Law,Beijing 102249,China
    2 Institute of Digital Jurisprudence,Zhejiang University,Hangzhou 310008,China
  • Received:2021-05-06 Revised:2021-07-15 Online:2021-12-15 Published:2021-11-26
  • About author:ZHOU Wei,born in 1985,assistant professor,Ph.D.His main research in-terests include legal service and judicial management technology,and legal information management.
    WEI Bin,born in 1986,professor of Hundred Talents Program,Ph.D supervisor,is a member of China Computer Federation.His main research interests include AI & Law,knowledge representation and legal logic.
  • Supported by:
    Research and Innovation Project of CUPL(21ZFQ82005),Key R & D Program of Zhejiang Province (2020C01060),Key R & D Projects of the Ministry of Science and Technology(2018YFC0831800),Key Project of National Social Science Foundation(20&ZD047) and Fundamental Research Funds for the Central Universities.

摘要: 当前面向中文内容的自动摘要模型应用于法律裁判文书时,主要采用抽取式方法进行摘要。但由于法律文本比较冗长、结构化程度较低,抽取式摘要的精准度和可靠性有所欠缺。为了获得法律裁判文书的高质量文本摘要,文中提出了一种生成式多模型融合的自动摘要方法。在Seq2Seq模型的基础上,引入注意力(attention)机制,同时通过Bert预训练和强化学习等方法,结合选择门技术,提出了BASR(Bert Based Attention Seq2Seq Reinforced Model)模型。将50 000篇法律裁判文书作为语料,以小额诉讼和简易程序类型的裁判文书为代表性研究对象,实验结果证明新模型有较好的效果,在ROUGE评价中相比传统的Seq2Seq+Attention模型取得了均值5.81%的性能提升。

关键词: 裁判文书, 自动摘要, 模型融合, Seq2Seq, 注意力机制, 强化学习

Abstract: At present,the automatic summarization model for Chinese content applied to legal judgement documents mainly adopts the extraction method.However,due to the lengthiness and low level of structure of legal texts,the accuracy and reliability of extraction method is insufficient for practical application.In order to obtain high quality summaries of legal judgment documents,in this paper,we propose an abstractive automatic summarization model based on multi-model fusion.Based on Seq2Seq model,we apply attention mechanism and selective gates to better process the data input.Specifically,we combine Bert pre-trai-ning and reinforcement learning policy to optimize our model.The corpus we built consists of 50 000 legal judgment documents regarding small claims procedure and summary procedure.Evaluations on the corpus demonstrate that the proposed model outperforms all of the baseline model,and the mean ROUGE score is 5.81% higher than that of conventional Seq2seq+Attentionmodel.

Key words: Judgement documents, Automatic summarization, Model fusion, Seq2Seq, Attention mechanism, Reinforcement lear-ning

中图分类号: 

  • TP18
[1]FU Y L.The Functions and Style of Civil Judicial Decisions [J].Social Sciences in China,2000(4):123-133.
[2]Supreme People's Court.Provisions of the Supreme People's Court on the Issuance of Judgments on the Internet by the People's Courts (2016 Revision)[EB/OL].(2016-10-1)[2021-04-25].https://www.pkulaw.com/en_law/e9ea61f2aaa98dfabdfb.html?flag=chinese/.
[3]Supreme People's Court.China Judgement Online[EB/OL]. (2021-04-25)[2021-04-25].https://wenshu.court.gov.cn/.
[4]HOU S L,ZHANG S H,FEI C Q.A Survey to Text Summarization:Popular Datasets and Methods[J].Journal of Chinese Information Processing,2019,33(5):1-16.
[5]LI Q F.Research on the Method of Multi-document Summarization Based on Topic Model[D].Dalian:Dalian Maritime University,2013.
[6]LI F,HUANG J Z,LI Z J,et al.Automatic Summarization Method of News Texts Using Keywords Expansion[J].Journal of Frontiers of Computer Science and Technology,2016,10(3):372-380.
[7]CHEN Y,BANSAL M.Fast abstractive summarization with reinforce-selected sentence rewriting[J]. arXiv:1805.11080.2018.
[8]LUHN H P.The Automatic Creation of Literature Abstracts [J].IBM Journal of Research and Development,1958,2(2):159-165.
[9]EDMUNDSON H P,WYLLYS R E.Automatic abstracting and indexing - survey and recommendations[J].Communications of the ACM,1961,4(5):226-234.
[10]EDMUNDSON H P.New methods in automatic extracting[J].Journal of the ACM (JACM),1969,16(2):264-85.
[11]WANG Y C,XU H M.OA Automatic Abstracting System on Chinese Documents[J].Journal of the China Society for Scienti-fic and Technical Information,1997(2):49-53.
[12]XU Y D,XU Z M,WANG X L,et al.Multi-Document Automa- tic Summarization Technique Based on Information Fusion[J].Chinese Journal of Computers,2007,30(11):2048-2054.
[13]RUSH A M,CHOPRA S,WESTON J.A Neural Attention Model for Abstractive Sentence Summarization[C]//Procee-dings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:379-389.
[14]HU B,CHEN Q,ZHU F.LCSTS:A Large Scale Chinese Short Text Summarization Dataset[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1967-1972.
[15]YU L.Automatic Chinese Text Summarization Method Based on Convolutional Neural Network[D].Harbin:Harbin Institute of Technology,2017.
[16]ZHOU C D,ZENG B Q,WANG S Y,et al.Chinese Summarization Research on Combination of Local Attention and Convolutional Neural Network[J].Computer Engineering and Applications,2019,55(8):132-137.
[17]MOENS M F,UYTTENDAELE C.Automatic text structuring and categorization as a first step in summarizing legal cases[J].Information Processing & Management,1997,33(6):727-737.
[18]FARZINDAR A,LAPALME G.LetSum:an automatic Legal Text Summarizing system[C]//Legal Knowledge and Information Systems:Jurix 2004,the Seventeenth Annual Conference.2004:11-18.
[19]ZHONG L,ZHONG Z,ZHAO Z,et al.Automatic Summarization of Legal Decisions using Iterative Masking of Predictive Sentences[C]//Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law.2019:163-172.
[20]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding [J].arXiv:1810.04805,2018.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Advances in Neural Information Processing Systems.Cambridge,2017:5998-6008.
[22]WISEMAN S,RUSH A M.Sequence-to-sequence learning as beam-search optimization[J].arXiv:1606.02960.2016.
[23]ZHANG S,ZHAO T J,YAO C,et al.Research on Sentence Optimum Selection Algorithm for Multi-Document Summarization[J].Journal of Electronics & Information Technology,2008,30(12):2921-2925.
[24]Supreme People's Court.Notice by the Supreme People's Court of Issuing the Formats of Litigation Documents Related to the Pilot Program of the Reform of Separation between Complicated Cases and Simple Ones under Civil Procedure [EB/OL].(2020-09-30)[2021-04-15].https://www.pkulaw.com/chl/cafe4ca0b1059c4fbdfb.html.
[25]FENG D J,YANG L,YAN J F.Research on Automatic Text Summarization Based on Dual-Encoder Structure[J].Computer Engineering,2020,46(6):60-64.
[26]MA S,SUN X,XU J,et al.Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2017:635-640.
[1] 代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法[J]. 计算机科学, 2021, 48(9): 235-243.
[2] 谢良旭, 李峰, 谢建平, 许晓军. 基于融合神经网络模型的药物分子性质预测[J]. 计算机科学, 2021, 48(9): 251-256.
[3] 吴少波, 傅启明, 陈建平, 吴宏杰, 陆悠. 基于相对熵的元逆强化学习方法[J]. 计算机科学, 2021, 48(9): 257-263.
[4] 成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制[J]. 计算机科学, 2021, 48(9): 271-277.
[5] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究[J]. 计算机科学, 2021, 48(8): 80-85.
[6] 叶中玉, 吴梦麟. 融合时序监督和注意力机制的脉络膜新生血管分割[J]. 计算机科学, 2021, 48(8): 118-124.
[7] 王雷全, 候文艳, 袁韶祖, 赵欣, 林瑶, 吴春雷. 利用全局与局部帧级特征进行基于共享注意力的视频问答[J]. 计算机科学, 2021, 48(8): 145-149.
[8] 张瑾, 段利国, 李爱萍, 郝晓燕. 基于注意力与门控机制相结合的细粒度情感分析[J]. 计算机科学, 2021, 48(8): 226-233.
[9] 周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现[J]. 计算机科学, 2021, 48(7): 40-46.
[10] 李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统[J]. 计算机科学, 2021, 48(7): 47-54.
[11] 宋龙泽, 万怀宇, 郭晟楠, 林友芳. 面向出租车空载时间预测的多任务时空图卷积网络[J]. 计算机科学, 2021, 48(7): 112-117.
[12] 桑春艳, 胥文, 贾朝龙, 文俊浩. 社交网络中基于注意力机制的网络舆情事件演化趋势预测[J]. 计算机科学, 2021, 48(7): 118-123.
[13] 卿来云, 张建功, 苗军. 在线异常事件检测的时序建模[J]. 计算机科学, 2021, 48(7): 206-212.
[14] 梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展[J]. 计算机科学, 2021, 48(7): 316-323.
[15] 王英恺, 王青山. 能量收集无线通信系统中基于强化学习的能量分配策略[J]. 计算机科学, 2021, 48(7): 333-339.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 姚兰, 赵永恒, 施雨晴, 于明鹤. 一种基于视频分析的高速公路交通异常事件检测算法[J]. 计算机科学, 2020, 47(8): 208 -212 .
[2] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[3] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[4] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .
[5] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[6] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .
[7] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述[J]. 计算机科学, 2021, 48(11): 4 -11 .
[8] 杨章林, 谢钧, 张耕强. 基于定向天线的飞行自组网定向路由协议综述[J]. 计算机科学, 2021, 48(11): 334 -344 .
[9] 张倩, 肖丽. 基于流线的流场可视化绘制方法综述[J]. 计算机科学, 2021, 48(12): 1 -7 .
[10] 王焘, 张树东, 李安, 邵亚茹, 张文博. 一种面向异常传播的微服务故障诊断方法[J]. 计算机科学, 2021, 48(12): 8 -16 .