计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 331-336.doi: 10.11896/jsjkx.210500028

• 人工智能 • 上一篇    下一篇

面向法律裁判文书的生成式自动摘要模型

周蔚1, 王兆毓1, 魏斌2   

  1. 1 中国政法大学法治信息管理学院 北京102249
    2 浙江大学数字法治研究院 杭州310008
  • 收稿日期:2021-05-06 修回日期:2021-07-15 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 魏斌(srsysj@zju.edu.cn)
  • 作者简介:zhouwei@cupl.edu.cn
  • 基金资助:
    中国政法大学科研创新项目(21ZFQ82005);浙江省重点研发计划(2020C01060);国家重点研发计划(2018YFC0831800);国家社科基金重大项目(20&ZD047);中央高校基本科研业务费专项资金资助

Abstractive Automatic Summarizing Model for Legal Judgment Documents

ZHOU Wei1, WANG Zhao-yu1, WEI Bin2   

  1. 1 School of Information Management for Law,China University of Political Science and Law,Beijing 102249,China
    2 Institute of Digital Jurisprudence,Zhejiang University,Hangzhou 310008,China
  • Received:2021-05-06 Revised:2021-07-15 Online:2021-12-15 Published:2021-11-26
  • About author:ZHOU Wei,born in 1985,assistant professor,Ph.D.His main research in-terests include legal service and judicial management technology,and legal information management.
    WEI Bin,born in 1986,professor of Hundred Talents Program,Ph.D supervisor,is a member of China Computer Federation.His main research interests include AI & Law,knowledge representation and legal logic.
  • Supported by:
    Research and Innovation Project of CUPL(21ZFQ82005),Key R & D Program of Zhejiang Province (2020C01060),Key R & D Projects of the Ministry of Science and Technology(2018YFC0831800),Key Project of National Social Science Foundation(20&ZD047) and Fundamental Research Funds for the Central Universities.

摘要: 当前面向中文内容的自动摘要模型应用于法律裁判文书时,主要采用抽取式方法进行摘要。但由于法律文本比较冗长、结构化程度较低,抽取式摘要的精准度和可靠性有所欠缺。为了获得法律裁判文书的高质量文本摘要,文中提出了一种生成式多模型融合的自动摘要方法。在Seq2Seq模型的基础上,引入注意力(attention)机制,同时通过Bert预训练和强化学习等方法,结合选择门技术,提出了BASR(Bert Based Attention Seq2Seq Reinforced Model)模型。将50 000篇法律裁判文书作为语料,以小额诉讼和简易程序类型的裁判文书为代表性研究对象,实验结果证明新模型有较好的效果,在ROUGE评价中相比传统的Seq2Seq+Attention模型取得了均值5.81%的性能提升。

关键词: Seq2Seq, 裁判文书, 模型融合, 强化学习, 注意力机制, 自动摘要

Abstract: At present,the automatic summarization model for Chinese content applied to legal judgement documents mainly adopts the extraction method.However,due to the lengthiness and low level of structure of legal texts,the accuracy and reliability of extraction method is insufficient for practical application.In order to obtain high quality summaries of legal judgment documents,in this paper,we propose an abstractive automatic summarization model based on multi-model fusion.Based on Seq2Seq model,we apply attention mechanism and selective gates to better process the data input.Specifically,we combine Bert pre-trai-ning and reinforcement learning policy to optimize our model.The corpus we built consists of 50 000 legal judgment documents regarding small claims procedure and summary procedure.Evaluations on the corpus demonstrate that the proposed model outperforms all of the baseline model,and the mean ROUGE score is 5.81% higher than that of conventional Seq2seq+Attentionmodel.

Key words: Attention mechanism, Automatic summarization, Judgement documents, Model fusion, Reinforcement lear-ning, Seq2Seq

中图分类号: 

  • TP18
[1]FU Y L.The Functions and Style of Civil Judicial Decisions [J].Social Sciences in China,2000(4):123-133.
[2]Supreme People's Court.Provisions of the Supreme People's Court on the Issuance of Judgments on the Internet by the People's Courts (2016 Revision)[EB/OL].(2016-10-1)[2021-04-25].https://www.pkulaw.com/en_law/e9ea61f2aaa98dfabdfb.html?flag=chinese/.
[3]Supreme People's Court.China Judgement Online[EB/OL]. (2021-04-25)[2021-04-25].https://wenshu.court.gov.cn/.
[4]HOU S L,ZHANG S H,FEI C Q.A Survey to Text Summarization:Popular Datasets and Methods[J].Journal of Chinese Information Processing,2019,33(5):1-16.
[5]LI Q F.Research on the Method of Multi-document Summarization Based on Topic Model[D].Dalian:Dalian Maritime University,2013.
[6]LI F,HUANG J Z,LI Z J,et al.Automatic Summarization Method of News Texts Using Keywords Expansion[J].Journal of Frontiers of Computer Science and Technology,2016,10(3):372-380.
[7]CHEN Y,BANSAL M.Fast abstractive summarization with reinforce-selected sentence rewriting[J]. arXiv:1805.11080.2018.
[8]LUHN H P.The Automatic Creation of Literature Abstracts [J].IBM Journal of Research and Development,1958,2(2):159-165.
[9]EDMUNDSON H P,WYLLYS R E.Automatic abstracting and indexing - survey and recommendations[J].Communications of the ACM,1961,4(5):226-234.
[10]EDMUNDSON H P.New methods in automatic extracting[J].Journal of the ACM (JACM),1969,16(2):264-85.
[11]WANG Y C,XU H M.OA Automatic Abstracting System on Chinese Documents[J].Journal of the China Society for Scienti-fic and Technical Information,1997(2):49-53.
[12]XU Y D,XU Z M,WANG X L,et al.Multi-Document Automa- tic Summarization Technique Based on Information Fusion[J].Chinese Journal of Computers,2007,30(11):2048-2054.
[13]RUSH A M,CHOPRA S,WESTON J.A Neural Attention Model for Abstractive Sentence Summarization[C]//Procee-dings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:379-389.
[14]HU B,CHEN Q,ZHU F.LCSTS:A Large Scale Chinese Short Text Summarization Dataset[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1967-1972.
[15]YU L.Automatic Chinese Text Summarization Method Based on Convolutional Neural Network[D].Harbin:Harbin Institute of Technology,2017.
[16]ZHOU C D,ZENG B Q,WANG S Y,et al.Chinese Summarization Research on Combination of Local Attention and Convolutional Neural Network[J].Computer Engineering and Applications,2019,55(8):132-137.
[17]MOENS M F,UYTTENDAELE C.Automatic text structuring and categorization as a first step in summarizing legal cases[J].Information Processing & Management,1997,33(6):727-737.
[18]FARZINDAR A,LAPALME G.LetSum:an automatic Legal Text Summarizing system[C]//Legal Knowledge and Information Systems:Jurix 2004,the Seventeenth Annual Conference.2004:11-18.
[19]ZHONG L,ZHONG Z,ZHAO Z,et al.Automatic Summarization of Legal Decisions using Iterative Masking of Predictive Sentences[C]//Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law.2019:163-172.
[20]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding [J].arXiv:1810.04805,2018.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Advances in Neural Information Processing Systems.Cambridge,2017:5998-6008.
[22]WISEMAN S,RUSH A M.Sequence-to-sequence learning as beam-search optimization[J].arXiv:1606.02960.2016.
[23]ZHANG S,ZHAO T J,YAO C,et al.Research on Sentence Optimum Selection Algorithm for Multi-Document Summarization[J].Journal of Electronics & Information Technology,2008,30(12):2921-2925.
[24]Supreme People's Court.Notice by the Supreme People's Court of Issuing the Formats of Litigation Documents Related to the Pilot Program of the Reform of Separation between Complicated Cases and Simple Ones under Civil Procedure [EB/OL].(2020-09-30)[2021-04-15].https://www.pkulaw.com/chl/cafe4ca0b1059c4fbdfb.html.
[25]FENG D J,YANG L,YAN J F.Research on Automatic Text Summarization Based on Dual-Encoder Structure[J].Computer Engineering,2020,46(6):60-64.
[26]MA S,SUN X,XU J,et al.Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2017:635-640.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[4] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[5] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[7] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[8] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[10] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[13] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[14] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[15] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!