计算机科学 ›› 2023, Vol. 50 ›› Issue (2): 300-309.doi: 10.11896/jsjkx.220800169

• 人工智能 • 上一篇    下一篇

基于网络表示学习的车险欺诈溯因分析研究

李炜卓1,3,4, 卢冰洁2, 杨骏铭1, 那崇宁2   

  1. 1 南京邮电大学现代邮政学院 南京 210003
    2 之江实验室金融科技研究中心 杭州 311100
    3 南京大学计算机软件新技术国家重点实验室 南京 210093
    4 东南大学计算机网络和信息集成教育部重点实验室 南京 211189
  • 收稿日期:2022-08-16 修回日期:2022-11-06 出版日期:2023-02-15 发布日期:2023-02-22
  • 通讯作者: 卢冰洁(lubj@zhejianglab.com)
  • 作者简介:(liweizhuo@amss.ac.cn)
  • 基金资助:
    国家自然科学基金(62006125);江苏省双创博士项目(JSSCBS20210532);南京邮电大学引进人才科研启动基金(NY220171);之江实验室科研攻关项目(2020NF0AC01,2022NF0AC01)

Study on Abductive Analysis of Auto Insurance Fraud Based on Network Representation Learning

LI Weizhuo1,3,4, LU Bingjie2, YANG Junming1, NA Chongning2   

  1. 1 School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    2 Fintech Research Center,Zhejiang Lab,Hangzhou 311100,China
    3 State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China
    4 Key Laboratory of Computer Network and Information Integration(Southeast University),Ministry of Education,Nanjing 211189,China
  • Received:2022-08-16 Revised:2022-11-06 Online:2023-02-15 Published:2023-02-22
  • Supported by:
    National Natural Science Foundation of China(62006125),Foundation of Jiangsu Provincial Double-Innovation Doctor Program(JSSCBS20210532),NUPTSF(NY220171) and Key Research Project of Zhejiang Lab(2020NF0AC01,2022NF0AC01)

摘要: 车险欺诈检测对促进汽车保险业的良性健康发展有着重要意义。由于欺诈的判断涉及公民权利等核心内容,需要车险专家对案件进行核查,提供欺诈原因。尽管基于机器学习的方法泛化能力强、精确度高,但缺少可解释性,而基于专家系统的规则方法尽管有较好的可解释性,但受限于规则复杂的触发条件。为了解决未触发专家系统欺诈规则而被机器学习方法检测为“欺诈”的案件无法被解释的问题,文中提出了基于网络表示学习的车险欺诈溯因分析方法。该方法首先定义了车险欺诈溯因分析任务,然后采用网络表示学习对已触发专家系统中欺诈规则的案件进行案件-规则因子网络的建模,学习欺诈规则中因子的分布式向量表示。为了更好地度量“欺诈”案件与专家系统中因子未全部触发规则之间的相似度,该方法基于溯因缺省原理,设计了一种规则因子的加权拼接策略来缓解训练数据不足的问题。实验结果表明,所提方法相较于已有方法在车险欺诈溯因预测任务的3项指标中均能取得更好的效果。

关键词: 汽车保险欺诈, 网络表示学习, 溯因推理, 专家系统, 可解释性

Abstract: Auto insurance fraud detection plays an important role in promoting the healthy development of auto insurance.As the judgment of fraud involves the core content of civil rights,it is necessary for auto insurance experts to check the case and provide the reasons for fraud.Although the method based on machine learning have strongscalability and high accuracy,it lacks interpre-tability,while the rule method based on expert system has good interpretability,but it is limited by the trigger conditions of complex rules.To address the unexplainable problem of cases detected as“fraud” by machine learning methods without triggering the expert system fraud rules,this paper puts forward an analysis method of auto insurance fraud traceability based on network representationlear-ning.It first defines the abductive analysis task of auto insurance fraud.That is,for cases that are identified as “fraud ”ones by machine learning methods without triggering the expert system,it returns the ranking of the most likely fraud rules to auto insurance experts.Then,the method models the case-rule factor network based on the network representation lear-ning according to the fraud cases that have triggered the rules of the expert system,and learns the vector representation of these factors in fraud rules.To better measure the similarity between fraud cases and rules with incomplete triggering factors in the expert system,a weighted splicing strategy of factors in fraud rules is designed based on the principle of abductive reasoning,which can alleviate the problem of insufficient training data to some extent.Experimental results show that the proposed method can obtain better performances than existing methods in terms of three metrics.

Key words: Auto insurance fraud, Network representation learning, Abductive reasoning, Expert system, Interpretability

中图分类号: 

  • TP391
[1]DIAO L,WANG N.Research on Premium Income ForecastBased on X12-GLSTM Model [J].Computer Science,2020,47(S1):512-516.
[2]YU W,FENG G F,ZHANG W J.A Research on Fraud Detection System and Gang Identification of Vehicle Insurance [J].Insurance Studies,2017(2):63-73.
[3]ZHANG B Y,XIAO Y G,ZENG Y Z.A Comparative Study on Measuring Variable Importance in Auto Insurance Pricing—Based on Ensemble Learning and Generalized Linear Regression [J].Insurance Studies,2019(10):73-83.
[4]ŠUBELJ L,FURLAN Š,BAJEC M.An expert system for detecting automobile insurance fraud using social network analysis[J].Expert Systems with Applications,2011,38(1):1039-1052.
[5]LU B,LI W,NA C,NIU Z,et al.Auto Insurance Fraud Detection with Machine Learning Models:A Survey [J].Computer Engineering and Applications,2022,58(5):34-49.
[6]DING M,LAN X,PENG R,et al.Progress and Prospect of Machine Reasoning [J].Pattern Recognition and Artificial Intelligence,2021,34(1):1-13.
[7]CONG Y,WANG Z,ZHU J,et al.Insights into Dataset and Algorithm Related Problems in Artificial Intelligence for Law [J].Computer Science,2022,49(4):74-79.
[8]VIAENE S,DEDENE G,DERRIG R A.Auto claim fraud detection using Bayesian learning neuralnetworks[J].Expert Systems with Applications,2005,29(3):653-666.
[9]PAYAM H,NEDA R P.A Data Mining Model for Risk Assessment and Customer Segmentation in the Insurance Industry[J].International Journal of Strategic Decision Sciences(IJSDS),2013,4(1):52-78.
[10]KAŠĆELAN L,KAŠĆELAN V,NOVOVI-BURIĆ M.A Data Mining Approach for Risk Assessment in Car Insurance:Evidence from Montenegro[J].International Journal of Business Intelligence Research(IJBIR),2014,5(3):11-28.
[11]LI Y,YAN C,LIU W,et al.A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification[J].Applied Soft Computing,2018,70:1000-1009.
[12]HE X,CHUA T S.Neural factorization machines for sparse predictive analytics[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.2017:355-364.
[13]GUO J,LIU G,ZUO Y,et al.Learning sequential behavior representations for fraud detection[C]//Proceedings of the 2018 IEEE International Conference on Data Mining(ICDM).IEEE,2018:127-136.
[14]WANG R,FU B,FU G,et al.Deep & cross network for ad click predictions[C]//Proceedings of the ADKDD'17.2017:1-7.
[15]SUBUDHI S,PANIGRAHI S.Use of optimized Fuzzy C-Means clustering and supervised classifiers for automobile insurance fraud detection[J].Journal of King Saud University-Computer and Information Sciences,2020,32(5):568-575.
[16]MAJHI S K.Fuzzy clustering algorithm based on modifiedwhale optimization algorithm for automobile insurance fraud detection[J].Evolutionary intelligence,2021,14(1):35-46.
[17]GUO G Z,DUAN J X.Game analysis of insurance fraud [J].Journal of Capital University of Economics and Business,1999(3):51-54.
[18]LIU X H,JIN J L.The Insurance Fraud Game and Insurance Contract Based on Optimal Game Strategies [J].System Engineering Theory and Practice,2004(2):19-24.
[19]ZHAO G Q,WU H.Is There Moral Hazard in Chinese Automobile Insurance Market? —— Evidence from Dynamic Renewal Data [J].Journal of Financial Research,2010(6):175-188.
[20]TANG J,MO Y W.Construction of auto insurance anti-fraud system based on data mining technology [J].Shanghai Insu-rance,2013(11):39-42,63.
[21]WANG H W.A Research on Chinese Insurers' Moral Hazard Screening in Operation:From the Big Data Hadoop Clustering Analysis Technology Perspective [J].Insurance Studies,2016(2):59-67.
[22]YAN C,LI Y Q,SUN H T.A Research on Automobile Insurance Fraud Identification Based on Random Forest Model and Ant Colony Optimization Algorithm [J].Insurance Studies,2017(6):114-127.
[23]YU W,FENG G F,ZHANG W J.A Research on Fraud Detection System and Gang Identification of Vehicle Insurance [J].Insurance Studies,2017(2):63-73.
[24]XU X,WANG Z X,WANG M Q.The model and empirical studyof motor vehicle insurance fraud identification based on deep learning technology [J].Shanghai Insurance,2019(8):53-58.
[25]PANIGRAHI S,PALKAR B.Comparative analysis on classification algorithms of auto-insurance fraud detection based on feature selection algorithms[J].International Journal of Computational Science and Engineering,2018,6(9):72-77.
[26]HASSAN A K I,ABRAHAM A.Modeling insurance fraud detection using imbalanced data classification[C]//Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing.Cham:Springer,2016:117-127.
[27]PADHI S,PANIGRAHI S.Decision Templates based EnsembleClassifiers for Automobile Insurance Fraud Detection[C]//Proceedings of the 2019 Global Conference for Advancement in Technology(GCAT).IEEE,2019:1-5.
[28]KUANG K,LI L,GENG Z,et al.Causal Inference [J].Engineering,2020,6(3):107-130.
[29]CHEN R,JIANG Y F,LIN L.Study of Abductive Reasoning:State of the Art and Problems[J].Computer Science,2003(5):25-27,38.
[30]SUN J G,LIU R S,CHEN R.Abductive Diagnosis from Propositional Default Theories[J].Journal of Jilin University(Science Edition),1998(4):34-38.
[31]LIN Y.The risk of motor vehicle and the principle of the study of the proxy commission in China[J].Journal of Central South University(Social Sciences) 2006,12(3):274-278.
[32]HUANG P,LI J.Mining Model for the Insurance Retainment Rules Based on Rough Sets[J].Journal of Shanghai Jiaotong University,2004(4):641-645.
[33]FAN X J.The Extraction of High Profitable Custom's Characteristics Based on Variable Precision Rough Set[J].Journal of Donghua University(Natural Science),2004(3):43-47.
[34]ZHU J Z.Research on data notification and protection mechanism of UBI auto insurance under networking[J].Financial Regulation Research,2020(8):102-114.
[35]TU C C,YANG C,LIU Z Y,et al.Network representationlearning:an overview[J].Scientia Sinica(Informationis),2017,47(8):980-996.
[36]CUI P,WANG X,PEI J,et al.A Survey on Network Embedding[J].IEEE Transactions on Knowledge and Data Engineering,2019,31(5):833-852.
[37]LIU X Y,TANG J.Network representation learning:A macroand micro view[J].AI Open,2021(2):43-64.
[38]TANG J,QU M,WANG M Z,et al.LINE:Large-scale Information Network Embedding[C]//Proceedings of the 24th International Conference on World Wide Web.ACM,Italy,2015:1067-1077.
[39]CHEN F,WANG Y C,WANG B,et al.Graph representation learning:a survey[J].Transactions on Signal and Information Processing,2020,9(1):e15.
[40]WANG Q,MAO Z,WANG B,et al.Knowledge graph embedding:A survey of approaches and applications[J].IEEE Transac-tions on Knowledge and Data Engineering,2017,29(12):2724-2743.
[41]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear-ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining,USA.2014:701-710.
[42]GROVER A,LESKOVEC J.node2vec:Scalable Feature Lear-ning for Networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,USA.2016:855-864.
[43]CAO S,LU W,XU Q.GraRep:Learning Graph Representations with Global Structural Information [C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management,Australia.2015:891-900.
[44]XIAO W,CUI P,WANG J,et al.Community Preserving Net-work Embedding[C]//Proceedings of the 31st AAAI Confe-rence on Artificial Intelligence.USA,2017:203-209.
[45]JIE Z,DONG X Y,WANG Y,et al.ProNE:Fast and Scalable Network Representation Learning[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence,China.2019:4278-4284.
[46]MA R X,LI Z Y,CHEN Z K,et al.Review of Reasoning on Knowledge Graph[J].Computer Science,2022,49(S1):74-85.
[1] 董永峰, 黄港, 薛婉若, 李林昊.
融合IRT的图注意力深度知识追踪模型
Graph Attention Deep Knowledge Tracing Model Integrated with IRT
计算机科学, 2023, 50(3): 173-180. https://doi.org/10.11896/jsjkx.211200134
[2] 王少将, 刘佳, 郑锋, 潘祎诚.
机器学习层谱聚类综述
Survey on Hierarchical Clustering for Machine Learning
计算机科学, 2023, 50(1): 9-17. https://doi.org/10.11896/jsjkx.211000185
[3] 王明, 武文芳, 王大玲, 冯时, 张一飞.
生成链接树:一种高数据真实性的反事实解释生成方法
Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity
计算机科学, 2022, 49(9): 33-40. https://doi.org/10.11896/jsjkx.220300158
[4] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[5] 成科扬, 王宁, 崔宏纲, 詹永照.
基于局部注意力图互迁移的可解释性优化方法
Interpretability Optimization Method Based on Mutual Transfer of Local Attention Map
计算机科学, 2022, 49(5): 64-70. https://doi.org/10.11896/jsjkx.210400176
[6] 富坤, 郭云朋, 禚佳明, 李佳宁, 刘琪.
语义增强的完全不平衡标签网络表示学习算法
Semantic Information Enhanced Network Embedding with Completely Imbalanced Labels
计算机科学, 2022, 49(11): 109-116. https://doi.org/10.11896/jsjkx.210900101
[7] 陈之彧, 单志龙.
知识追踪研究进展
Research Advances in Knowledge Tracing
计算机科学, 2022, 49(10): 83-95. https://doi.org/10.11896/jsjkx.211000119
[8] 蒋宗礼, 樊珂, 张津丽.
基于生成对抗网络和元路径的异质网络表示学习
Generative Adversarial Network and Meta-path Based Heterogeneous Network Representation Learning
计算机科学, 2022, 49(1): 133-139. https://doi.org/10.11896/jsjkx.201000179
[9] 朝乐门, 王锐.
数据科学平台:特征、技术及趋势
Data Science Platform:Features,Technologies and Trends
计算机科学, 2021, 48(8): 1-12. https://doi.org/10.11896/jsjkx.210600033
[10] 富坤, 赵晓梦, 付紫桐, 高金辉, 马浩然.
基于不完全信息的深度网络表示学习方法
Deep Network Representation Learning Method on Incomplete Information Networks
计算机科学, 2021, 48(12): 212-218. https://doi.org/10.11896/jsjkx.201000015
[11] 潘雨, 邹军华, 王帅辉, 胡谷雨, 潘志松.
基于网络表示学习的深度社团发现方法
Deep Community Detection Algorithm Based on Network Representation Learning
计算机科学, 2021, 48(11A): 198-203. https://doi.org/10.11896/jsjkx.210200113
[12] 赵曼, 赵加坤, 刘金诺.
基于自我中心网络结构特征和网络表示学习的链路预测算法
Link Prediction Algorithm Based on Ego Networks Structure and Network Representation Learning
计算机科学, 2021, 48(11A): 211-217. https://doi.org/10.11896/jsjkx.201200231
[13] 丁钰, 魏浩, 潘志松, 刘鑫.
网络表示学习算法综述
Survey of Network Representation Learning
计算机科学, 2020, 47(9): 52-59. https://doi.org/10.11896/jsjkx.190300004
[14] 张佳嘉, 张小洪.
多分支卷积神经网络肺结节分类方法及其可解释性
Multi-branch Convolutional Neural Network for Lung Nodule Classification and Its Interpretability
计算机科学, 2020, 47(9): 129-134. https://doi.org/10.11896/jsjkx.190700203
[15] 蒋宗礼, 李苗苗, 张津丽.
基于融合元路径图卷积的异质网络表示学习
Graph Convolution of Fusion Meta-path Based Heterogeneous Network Representation Learning
计算机科学, 2020, 47(7): 231-235. https://doi.org/10.11896/jsjkx.190600085
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!