计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 122-128.doi: 10.11896/j.issn.1002-137X.2018.07.020

• 信息安全 • 上一篇    下一篇

基于概率图的银行电信诈骗检测方法

刘枭,王晓国   

  1. 同济大学电子与信息工程学院计算机科学与技术系 上海201800
  • 收稿日期:2018-03-01 出版日期:2018-07-30 发布日期:2018-07-30
  • 作者简介:刘 枭(1989-),男,博士生,主要研究方向为数据挖掘、欺诈检测,E-mail:nanaya100@gmail.com;王晓国(1966-),男,博士,教授,主要研究方向为数据挖掘、企业信息化,E-mail:xiaoguowang@tongji.edu.cn(通信作者)。

Probabilistic Graphical Model Based Approach for Bank Telecommunication Fraud Detection

LIU Xiao, WANG Xiao-guo   

  1. Department of Computer Science and Technology,College of Electronics and Information Engineering, Tongji University,Shanghai 201800,China
  • Received:2018-03-01 Online:2018-07-30 Published:2018-07-30

摘要: 近几年,经由电信网络实施的诈骗频发,给银行用户带来了巨大的经济损失。现有的银行欺诈检测方法通常先提取账户交易的RFM(Recency,Frequency,Monetary Value)特征,然后采用有监督的方法训练分类器来识别诈骗交易。但是,这类方法没有考虑交易网络的结构特征。电信诈骗具有明显的集团特性,在交易网络中会呈现出特定的结构特征,使用交易网络的结构特征有助于识别电信诈骗。针对电信诈骗的集团特性,设计相应的马尔可夫网络用于识别电信诈骗中的欺诈账户。给出了该马尔可夫网络的线性迭代优化式,并证明了其理论收敛条件。最后在模拟数据和真实数据上测试了所提方法的性能,并将其与CIA和SybilRank进行比较。实验结果表明,所提方法具有更低的假阳性和更好的抗噪性。在真实数据上,将基于账户交易特征的方法与所提方法结合,可以取得比单独使用两种方法更好的识别性能。

关键词: 半监督学习, 电信诈骗, 马尔可夫网络, 欺诈检测, 数据挖掘

Abstract: Over the past few years,telecommunication fraud has caused enormous economic losses for bank users.Exis-ting detection methods firstly extract statistical features,such as RFM (Recency,Frequency,Monetary Value) of user transactions,and then use supervised learning algorithms to detect fraud transactions or fraud accounts through training classifiers.However,the RFM features don’t make use of the network structure of the transaction network.This paper designed a pairwise markov random field to capture the characteristics of the network structure in telecommunication fraud.Then,it exploited a linear loopy belief propagation algorithm to estimate the posterior probability distribution and predict the label of an account.Finally,it compared the proposed method with CIA and SybilRank on both synthetic dataset and real-world dataset.The results show that the proposed method outperforms other methods and can improve the F1-score of the RFM features based method.

Key words: Data mining, Fraud detection, Markov random field, Semi-supervised learning, Telecommunication fraud

中图分类号: 

  • TP391
[1]YU H F,KAMINSKY M,GIBBONS P B,et al.Sybilguard:Defending against sybil attacks via social networks[J].IEEE/ACM Transactions on Networking(TON),2008:16(3):576-589.
[2]YU H,GIBBONS P B,KAMINSKY M,et al.Sybillimit:A nearo-ptimal social network defense against sybil attacks[J].IEEE/ACM Transactions on Networking,2010,18(3):885-898.
[3]DANEZIS G,MITTAL P.Sybilinfer:Detecting sybil nodesusing social networks[C]∥Proceedings of the Network and Distributed System Security Symposium,NDSS.San Diego,California,USA,2009.
[4]NGUYEN T,JINYANG L,LAKSHMINARAYANAN S,et al.Optimal sybil-resilient node admission control[C]∥Proceedings of IEEE INFOCOM.Shanghai,China,2011.
[5]WEI W,FENGYUANF X,CHIU C T,et al.Sybildefender:Adefense mechanism for sybil attacks in large social networks[J].IEEE Transactions on Parallel and Distributed Systems,2013,24(12):2492-2502.
[6]LU S,SHUCHENG Y,WENJING L,et al.Sybilshield:Anagent-aided social network-based sybil defense among multiple communities[C]∥Proceedings IEEE INFOCOM.Turin,Italy,2013.
[7]CAO Q,SIRIVIANOS M,YANG X,et al.Aiding the detection of fake accounts in large scale social online services[C]∥Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation(NSDI’12).2012:15.
[8]ZOLTA G,HECTOR G M,JAN P.Combating web spam with trustrank[C]∥Proceedings of the 30th International Conference on Very Large Databases.2004.
[9]YANG C,HARKREADER R,ZHANG J,et al.Analyzing spam-mers’ social networks for fun and profit:A case study of cyber criminal ecosystem on twitter[C]∥Proceedings of the 21st International Conference on World Wide Web.New York,NY,USA:ACM,2012:71-80.
[10]KOLDA T G,PROCOPIO M J.Generalized badrank with gra-duated trust[R].Sandia National Laboratories,2009.
[11]LESNIEWSKILAAS C,KAASHOEKM F.Whanau:A sybil-proof distributed hash table[C]∥Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation.Berkeley,CA,USA:USENIX Association,2010:8.
[12]MOHAISEN A,YUN A,KIM Y.Measuring the mixing time of social graphs[C]∥Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement.New York,NY,USA:ACM,2010:383-389.
[13]BEHRENDS E.Introduction to markov chains:with special em-phasis on rapid mixing[M]∥Advanced Lectures in Mathema-tics,2000.
[14]BLONDEL V D,GUILLAUME J L,LAMBIOTTE R,et al.Fast unfolding of communities in large networks[J].Journal of Statistical Mechanics:Theory and Experiment,2008,2008(10):155-168.
[15]PANDIT S,CHAU D H,WANG S,et al.Netprobe:A fast and scalable system for fraud detection in online auction networks[C]∥Proceedings of the 16th International Conference on World Wide Web.New York,NY,USA:ACM,2007:201-210.
[16]SHEBUTI R,LEMAN A.Collective Opinion Spam Detection:Bridging Review Networks and Metadata[C]∥Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15).ACM,New York,NY,USA,2015:985-994.
[17]PEARL J.Probabilistic reasoning in intelligent systems:Networks of plausible inference[M].San Francisco:Morgan Kaufmann Publishers Inc.,1988.
[18]JIA J,WANG B,ZHANG L,et al.AttriInfer:Inferring User Attributes in Online Social Networks Using Markov Random Fields[C]∥International Conference on World Wide Web.2017:1561-1569.
[19]GATTERBAUER W,GUNNEMANN S,KOUTRA D,et al.Linearized and single-pass belief propagation[J].Proceedings of the Vidb Endowment,2014,8(5):581-592.
[20]WANG B,GONG N Z,FU H.Gang:Detecting fraudulent users in online social networks via guilt-by-association on directed graphs[C]∥2017 IEEE International Conference on Data Mi-ning (ICDM).New Orleans,LA,USA,2017:465-474.
[21]SAAD Y.Iterative methods for sparse linear systems(2nd ed)[M].Reading,MA:Society for Industrial and Applied Mathematics,2003.
[22]HOLME P,KIM B J.Growing scale-free networks with tunable clustering[J].Physical Review E,2002,65(22):026107.
[23]BAHNSEN A C,AOUADA D,STOJANOVIC A.Feature engineering strategies for credit card fraud detection[J].Expert Systems with Applications An International Journal,2016,51(C):134-142.
[1] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[2] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[3] 庞兴龙, 朱国胜.
基于半监督学习的网络流量分析研究
Survey of Network Traffic Analysis Based on Semi Supervised Learning
计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131
[4] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[5] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[6] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[7] 许华杰, 陈育, 杨洋, 秦远卓.
基于混合样本自动数据增强技术的半监督学习方法
Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques
计算机科学, 2022, 49(3): 288-293. https://doi.org/10.11896/jsjkx.210100156
[8] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[9] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[10] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[11] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[12] 张岩金, 白亮.
一种基于符号关系图的快速符号数据聚类算法
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[13] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[14] 孙文赟, 金忠, 赵海涛, 陈昌盛.
基于深度特征增广的跨域小样本人脸欺诈检测算法
Cross-domain Few-shot Face Spoofing Detection Method Based on Deep Feature Augmentation
计算机科学, 2021, 48(2): 330-336. https://doi.org/10.11896/jsjkx.200100020
[15] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!