计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 80-85.doi: 10.11896/jsjkx.210300130

所属专题: 自然语言处理 虚拟专题

• 数据库&大数据&数据科学* • 上一篇    下一篇

基于深度学习的民事案件判决结果分类方法研究

王立梅1, 朱旭光2,3, 汪德嘉3, 张勇4, 邢春晓4   

  1. 1 中国政法大学刑事司法学院 北京100088
    2 中国政法大学网络法学研究院 北京 100088
    3 江苏通付盾科技有限公司 江苏 苏州215000
    4 清华大学北京信息科学与技术国家研究中心 北京100084
  • 收稿日期:2021-03-12 修回日期:2021-05-21 发布日期:2021-08-10
  • 通讯作者: 朱旭光(xuguangs@yeah.net)
  • 基金资助:
    国家重点研发计划(2018YFC0831202)

Study on Judicial Data Classification Method Based on Natural Language Processing Technologies

WANG Li-mei1, ZHU Xu-guang2,3, WANG De-jia3, ZHANG Yong4 , XING Chun-xiao4   

  1. 1 School of Criminal Justice,China University of Political Science and Law,Beijing 100088,China;
    2 Institute of Cyber Law,China University of Political Science and Law,Beijing 100088,China;
    3 Jiangsu PayEgis Technology Co.,Ltd.,Suzhou,Jiangsu 215000,China;
    4 Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China
  • Received:2021-03-12 Revised:2021-05-21 Published:2021-08-10
  • About author:WANG Li-mei,born in 1974,Ph.D,professor.Her main research interests include cyber law and so on.(limeiw@cupl.edu.cn)ZHU Xu-guang,born in 1989,master.His main research interests include artificial intelligence and cyber law.
  • Supported by:
    National Key R&D Program(2018YFC0831202).

摘要: 裁判文书数量的快速增长对自动化分类提出了迫切要求,然而已有研究缺乏在民事案件这一细分领域下以判决结果为分类标准的方法的研究,无法实现对民事案件判决结果的准确分类。文中将深度学习技术应用于民事案件判决结果分类领域,通过横向对比多种深度学习模型得出了该领域下表现较好的模型,并依据裁判文书的数据特点对该模型进行了进一步的优化。实验结果证明,Transformer模型的判决结果分类的宏精准率、宏召回率和宏F1分数均高于其他模型。通过对数据预处理流程的优化和对Transformer模型位置嵌入方式的优化,模型的性能指标提升了1%~2%。

关键词: 裁判文书, 大数据, 分类, 深度学习, 司法数据, 自然语言处理

Abstract: The rapid increase in the number of judgment documents puts forward an urgent need for automated classification.However,there is a lack of method in existing studies that use judgment results as the subject of classification in the subdivision of civil cases,and therefore they cannot achieve accurate classification of judgment results in civil cases.In this paper,we apply deep learning technology in the field of classification of judgment results of civil cases,and obtain a model with better perfor-mance in this field through horizontal comparison of multiple deep learning models.This model is further optimized based on the data characteristics of the judgment document.After experiments,the Transformer model's macro precision rate,macro recall rate and macro F1 score in the judgment result classification are all higher than other models.By adjusting the data preprocessing process and adjusting the position embedding method of the Transformer model,the performance index of the model is increased by 1%~2%.

Key words: Big data, Classification, Deep learning, Judgment documents, Judicial data, Natural language processing

中图分类号: 

  • TP391.4
[1]WANG L S.China's experience in the construction of smart courts and its path optimization-Based on the application of big data and artificial intelligence[J].Inner Mongolia Social Sciences,2021,42(1):104-114.
[2]TANG Z,WANG Z S,ZHOU A,et al.Transformer-capsule ensemble model for text classification[J].Computer Engineering and Applications,2020,56(24):151-156.
[3]ZHU Y,GAO X,ZHANG W,et al.A Bi-Directional LSTM-CNN Model with Attention for Aspect-Level Text Classification[J].Future Internet,2018,10(12):116.
[4]PRASAD K M S,REDDY D T H.Text Mining:Classification of Text Documents using Granular Hybrid Classification Technique[J].International Journal of Research in Advent Technology,2019,7(6):1-8.
[5]BUGUEÑO M,MENDOZA M.Learning to combine classifiers outputs with the transformer for text classification[J].Intelligent Data Analysis,2020,24(1):15-41.
[6]ZHANG A,LI B,WANG W,et al.MII:A Novel Text Classification Model Combining Deep Active Learning with BERT[J].Computers,Materials & Continua,2020,63(3):1499-1514.
[7]CHEN L,SHAH R,LINK T,et al.Bert model fine-tuning for text classification in knee OA radiology reports[J].Osteoarthritis and Cartilage,2020,28(1):S315-S316.
[8]YU Q,WANG Z,JIANG K.Research on Text ClassificationBased on BERT-BiGRU Model[J].Journal of Physics:Confe-rence Series,2021,1746(1):012019.
[9]LEE T S,KANG S S.Automatic Text Summarization Based on Selective OOV Copy Mechanism with BERT Embedding[J].Journal of KIISE,2020,47(1):36-44.
[10]WENG Y,GU S Y,LI J,et al.A Text Classification Algorithm for the Structured Large-scale Judgment Documents[J].Journal of Tianjin University (Natural Science and Engineering Technology Edition),2021,54(4):418-425.
[11]WANG N,LI S L,LIU T L,et al.Tendency analysis of BiGRU decision results based on attention mechanism[J].Computer Systems & Applications,2019,28(3):191-195.
[12]CHENG H.Law Prediction and Similar Case Matching Research Facing Judicial Big Data[D].Taiyuan:Shanxi University,2020.
[13]MOTHUKURI R,BASAVESWARARAO B,SUNEETHA B.Judgement Classification Using Hybrid ANN-Shuffled Frog Leaping Model on Cyber Crime Judgement Database[J].Rev.d'Intelligence Artif,34(4):445-456.
[14]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems(NIPS'17).2017:6000-6010.
[15]LI K Y,CHEN Y,NIU S Z.BERT-based social e-commerce text classification algorithm[J].Computer Science,2021,48(2):87-92.
[16]NEBAUER C.Evaluation of convolutional neural networks for visual recognition[J].IEEE Transactions on Neural Networks,1998,9(4):685-696.
[17]WU H Y,YAN J,HUANG S B,et al.CNN_BiLSTM_Attention Hybrid Model for Text Classification[J].Computer Science,2020,47 (11A):24-27.
[18]DU L,CAO D,LIN S Y,et al.Extraction and Automatic Classification of TCM Medical Record Text Based on BERT and Bi-LSTM Fusion Attention Mechanism[J].Computer Science,2020,47 (11A):416-420.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[5] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[6] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[7] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[8] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[12] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[13] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[14] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[15] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!