基于异构信息网络的信贷反欺诈研究

doi:10.11896/jsjkx.221100173

摘要/Abstract

摘要： 近年来,移动终端设备的数字化程度陡升,信贷行业的欺诈行为呈现出动态发展、行为隐蔽和专业伪装等新特点,海量数据的跨量级增长为传统反欺诈算法的有效性和计算效率都带来了不小的挑战。因此,为了充分学习信贷场景中不同实体间的交互信息,降低算法计算消耗以使其适用于大规模图数据任务,提出了基于异构信息网络的特异群组挖掘算法BKH-(Bron-Kerbosh-H-II),即首先针对源数据中的信贷实体及实体间的关系进行界定和分类,并将不同实体间的相似度作为关系权重,以此构建信贷异构信息网络,对该网络采取了两阶段的基于H图的极大团枚举算法,用于挖掘特异群组,最终通过局部特征工程修正划分得到潜在的欺诈群体,经实验证明,BKH-II在4种评价指标上的准确度分别为 NMI=0.983,NRI=0.96,F-score=0.943,Omega=0.95,并表现出了良好的泛化性和较低的计算复杂性。

关键词: 异构信息网络, 信贷反欺诈, 特异群组挖掘, 社区发现, 图嵌入

Abstract: In recent years,the digitization of mobile terminal equipment has risen sharply,and fraudulent behaviors in the credit industry have shown new characteristics such as dynamic development,concealment of behavior,and professional camouflage.The cross-order growth of massive data has brought considerable challenges to the effectiveness and computational efficiency of traditional anti-fraud algorithms.Therefore,this paper aims to fully learn the interaction information between different entities in the credit scene,reduce the computational consumption of the algorithm to make it suitable for large-scale graph data tasks,and propose a specific group mining algorithm BKH-II(Bron-Kerbosh-H-II) based on heterogeneous information networks.First,defining and classifing the credit entities and the relationships between them in the source data,and using the similarity between different entities as the relationship weight to build a credit heterogeneous information network.A two-stage H-graph-based maximal clique enumeration algorithm is adopted for the network to mine unique groups.Finally,potential fraud groups are obtained through local feature engineering correction and division.Experiments prove that the accuracy of BKH-II on the four evaluation indicators is NMI=0.983,NRI=0.96,F-score=0.943,Omega=0.95,and shows good generalization and low computational complexity.

Key words: Heterogeneous information network, Credit anti-fraud, Specific group mining, Community discovery, Graph embedding

中图分类号:

TP391

刘华玲, 张国祥, 王柳月, 梁华璧. 基于异构信息网络的信贷反欺诈研究[J]. 计算机科学, 2023, 50(11A): 221100173-9. https://doi.org/10.11896/jsjkx.221100173

LIU Hualing, ZHANG Guoxiang, WANG Liuyue, LIANG Huabi. Study on Credit Anti-fraud Based on Heterogeneous Information Network[J]. Computer Science, 2023, 50(11A): 221100173-9. https://doi.org/10.11896/jsjkx.221100173

参考文献

[1]SHEN H,CHENG X,CAI K,et al.Detect overlapping and hie-rarchical community structure in networks[J].Physica A:Statistical Mechanics and its Applications,2009,388(8):1706-1712.
[2]LU Q,JU C.Research on Credit Card Fraud Detection Model Based on Class Weighted Support Vector Machine[J].Journal of Convergence Information Technology,2011,6(1):62-68.
[3]NIAN K,ZHANG H,TAYAL A,et al.Auto insurance fraud detection using unsupervised spectralranking for anomaly[J].The Journal of Finance and Data Science,2016,2(1):58-75.
[4]KELLER F,MULLER E,BOHM K.HiCS:High Contrast Subspaces for Density-Based Outlier Ranking[C]//2012 IEEE 28th International Conference on Data Engineering.Arlington,VA,USA,2012:1037-1048.
[5]NGUYEN H V,GOPALKRISHNAN V,ASSENT I.An unbiased distance-based outlier detection approach for high-dimensional data[C]//International Conference on Database Systems for Advanced Applications.Springer,Berlin,Heidelberg,2011:138-152.
[6]BHATTACHARYYA S,JHA S,THARAKUNNEL K,et al.Data mining for credit card fraud:A comparative study[J].Decision Support Systems,2011,50(3):602-613.
[7]YAN H,JIANG Y,LIU G.Telecomm Fraud Detection via Attributed Bipartite Network[C]//2018 15th International Conference on Service Systems and Service Management(IC-SSSM).IEEE,2018:1-6.
[8]HE Y,SONG Y,LI J,et al.Hetespaceywalk:A heterogeneous spacey random walk for heterogeneous information network embedding[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.2019:639-648.
[9]XIE Y,WANG Q,LI H H,et al.A Spatial-Temporal GraphMining Algorithm based on Spatial-Temporal Sparse Attention Network[J/OL].Computer Engineering:1-8.https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7ioT0BO4yQ4m_mOgeS2ml3UBXRVwfAzJsIi3Bsco8ZcxE_qZ4dUCMa-vfFpV0QH1EQ&uniplatform=NZKPT.
[10]ROY A,SUN J,MAHONEY R,et al.Deep learning detectingfraud in credit card transactions[C]//2018 Systems and Information Engineering Design Symposium (SIEDS).IEEE,2018:129-134.
[11]CAI H Y,YUAN S L,WEN Y,et al.Shilling Attacks Detection Based on CNN and Hesitant Fuzzy Sets[J].Engineering Science and Technology,2022,54(3):80-90.
[12]WANG W M,ZHI L P.Fraud detection model generalizationperformance improvement and interpretability study based on ADASYN-SFS-RF[J/OL].Computer application research:1-11.https://kns.cnki.net/kcms2/article/abstract?v=3uoqIhG8C44YLTlOAiTRKibYlV5Vjs7ioT0BO4yQ4m_mOgeS2ml3UH_TYrVERVm5vryPl24sxfOcsIMT5f6OT61zKGh0xRSr&uniplat-form=NZKPT.
[13]SHI C,LI Y,ZHANG J,et al.A survey of heterogeneous information network analysis[J].IEEE Transactions on Knowledge and Data Engineering,2016,29(1):17-37.
[14]LUSSEAU D,NEWMAN M E J.Identifying the role that animals play in their social networks[J].Proceedings of the Royal Society of London.Series B:Biological Sciences,2004,271(suppl_6):S477-S481.
[15]ZHAO Y Q,WU Y,CHEN X.An Algorithm for Large-scale Social Network Community Detection and Visualization[J].Journal of Computer-Aided Design and Graphics,2017,29 (2):328-336.
[16]GREGORY S.An algorithm to find overlapping communitystructure in networks[C]//European conference on principles of data mining and knowledge discovery.Springer,Berlin,Heidelberg,2007:91-102.
[17]SHEN H W,CHENG X Q,GUO J F.Quantifying and identifying the overlapping community structure in networks[J].Journal of Statistical Mechanics:Theory and Experiment,2009,2009(7):P07042.
[18]QIAN Y,LI Y,ZHANG M,et al.Quantifying edge significance on maintaining global connectivity[J].Scientific Reports,2017,7(1):1-13.
[19]LIU H,FEN L,JIAN J,et al.Overlapping community discovery algorithm based on hierarchical agglomerative clustering[J].International Journal of Pattern Recognition and Artificial Intelligence,2018,32(3):1850008.
[20]LI J,LI X,GAO Y,et al.Dynamic trustworthiness overlapping community discovery in mobile internet of things[J].IEEE Access,2018,6:74579-74597.
[21]HUANG F,LI X,ZHANG S,et al.Overlapping community detection for multimedia social networks[J].IEEE Transactions on multimedia,2017,19(8):1881-1893.
[22]YOSHIDA T.Weighted line graphs for overlapping community discovery[J].Social Network Analysis and Mining,2013,3(4):1001-1013.
[23]HUANG F L,ZHANG S C,ZHU X F.Discovering NetworkCommunity Based on Multi-Objective Optimization[J].Software Journal,2013,24(9):2062-2077.
[24]XIE J,SZYMANSKI B K.Community detection using a neighborhood strength driven label propagation algorithm[C]//2011 IEEE Network Science Workshop.IEEE,2011:188-195.
[25]GARZA S E,SCHAEFFER S E.Community detection with the Label Propagation Algorithm:A survey[J].Physica A:Statistical Mechanics and its Applications,2019,534:122058.
[26]HU J,DONG Y H,YANG B R.Community structure discovery algorithms in large complex networks [J].Computer Engineering,2008 (19):92-93,100.
[27]ZHANG X K,REN J,SONG C,et al.Label propagation algorithm for community detection based on node importance and label influence[J].Physics Letters A,2017,381(33):2691-2698.
[28]CORDASCO G,GARGANO L.Label propagation algorithm:a semi-synchronous approach[J].International Journal of Social Network Mining,2012,1(1):3-26.
[29]ZHANG Y L,XIA X W,XU X,et al.Review on Label Propagation Algorithms for Community Detection[J].Small Microcomputer System,2021,42(5):1093-1102.
[30]GAN C,WANG B,WANG Z J,et al.A Solution for High Avai-lability Memory Access[C]//19th International Conference on Algorithms and Architectures for Parallel Processing(ICA3PP 2019).2019.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed