Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230600199-7.doi: 10.11896/jsjkx.230600199

• Big Data & Data Science • Previous Articles     Next Articles

CTGANBoost:Credit Fraud Detection Based on CTGAN and Boosting

ZHUO Peiyan, ZHANG Yaona, LIU Wei, LIU Zijin, SONG You   

  1. School of Software,Beihang University,Beijing 100191,China
  • Published:2024-06-06
  • About author:ZHUO Peiyan,born in 1999,postgra-duate.Her main research interests include data mining and financial tech-nology.
    SONG You,born in 1973,professor,Ph.D supervisor.His main research interests include data analysis techniques,financial technology,information processing,and knowledge graph.
  • Supported by:
    Key Research and Development Program of Hebei Province ,China(21310101D).

Abstract: In the financial industry,credit fraud detection is an important task,which can reduce a lot of economic losses for banks and consumer institutions.However,there are problems of class imbalance and overlapping features of positive and negative samples in credit data,which lead to low sensitivity of minority class recognition and low data discrimination.To address these pro-blems,a CTGANBoost method is proposed for credit fraud detection.First,in each Boosting iteration of AdaBoost,the conditional tabular generative adversarial network(CTGAN) method based on class label information constraint is introduced to learn feature distribution for minority class data augmentation.Secondly,based on the enhanced data set synthesized by CTGAN,a weight normalization method is designed to ensure that the distribution characteristics and relative weights of the original data set are maintained during the sample weighting process.Experimental results on three open source datasets show that CTGANBoost outperforms other mainstream credit fraud detection methods,with AUC values increase by 0.5%~2.0% and F1 values increase by 0.6%~1.8%,which verifies the effectiveness and generalization ability of CTGANBoost method.

Key words: Credit fraud, Imbalance data, Ensemble learning, Generative adversarial network, AdaBoost

CLC Number: 

  • TP391
[1]AWOYEMI J O,ADETUNMBI A O,OLUWADARE S A.Credit card fraud detection using machine learning techniques:A comparative analysis[C]//2017 International Conference on Computing Networking and Informatics(ICCNI).IEEE,2017:1-9.
[2]MISHRA S.Handling imbalanced data:SMOTE vs.random undersampling[J].Int.Res.J.Eng.Technol,2017,4(8):317-320.
[3]LI Z,HUANG M,LIU G,et al.A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection[J].Expert Systems with Applications,2021,175:114750.
[4]CARCILLO F,LE BORGNE Y A,CAELEN O,et al.Combining unsupervised and supervised learning in credit card fraud detection[J].Information Sciences,2021,557:317-331.
[5]MOHAMMED R,RAWASHDEH J,ABDULLAH M.Machine learning with oversampling and undersampling techniques:overview study and experimental results[C]//2020 11th International Conference on Information and Communication Systems(ICICS).IEEE,2020:243-248.
[6]FERNÁNDEZ A,GARCIA S,HERRERA F,et al.SMOTE for learning from imbalanced data:progress and challenges,marking the 15-year anniversary[J].Journal of Artificial Intelligence Research,2018,61:863-905.
[7]BRANDT J,LANZÉN E.A comparative review of SMOTE and ADASYN in imbalanced data classification[J/OL].2021.https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1519153&dswid=-3893.
[8]LIN W C,TSAI C F,HU Y H,et al.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409:17-26.
[9]FERNÁNDEZ A,GARCÍA S,GALAR M,et al.Cost-sensitive learning[M]//Learning from Imbalanced Data Sets,2018:63-78.
[10]SELIYA N,ABDOLLAH ZADEH A,KHOSHGOFTAAR TM.A literature review on one-class classification and its potential applications in big data[J].Journal of Big Data,2021,8(1):1-31.
[11]TANHA J,ABDI Y,SAMADI N,et al.Boosting methods for multi-class imbalanced data classification:an experimental review[J].Journal of Big Data,2020,7:1-47.
[12]DOUZAS G,BACAO F,LAST F.Improving imbalanced lear-ning through a heuristic oversampling method based on k-means and SMOTE[J].Information Sciences,2018,465:1-20.
[13]MALDONADO S,LÓPEZ J,VAIRETTI C.An alternativeSMOTE oversampling strategy for high-dimensional datasets[J].Applied Soft Computing,2019,76:380-389.
[14]LU C,LIN S,LIU X,et al.Telecom fraud identification based on ADASYN and random forest[C]//2020 5th International Conference on Computer and Communication Systems(ICCCS).IEEE,2020:447-452.
[15]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[16]XU L,SKOULARIDOU M,CUESTA-INFANTE A,et al.Modeling tabular data using conditional gan[J/OL].Advances in Neural Information Processing Systems,2019,32.https://proceedings.neurips.cc/paper/2019/hash/254ed7d2de3b23ab10-936522dd547b78-Abstract.html.
[17]ZHAO Z,KUNAR A,BIRKE R,et al.Ctab-gan:Effective table data synthesizing[C]//Asian Conference on Machine Learning.PMLR,2021:97-112.
[18]CHOI E,BISWAL S,MALIN B,et al.Generating multi-labeldiscrete patient records using generative adversarial networks[C]//Machine Learning for Healthcare Conference.PMLR,2017:286-305.
[19]RAJABI A,GARIBAY O O.Tabfairgan:Fair tabular data generation with generative adversarial networks[J].Machine Learning and Knowledge Extraction,2022,4(2):488-501.
[20]VUTTIPITTAYAMONGKOL P,ELYAN E.Neighbourhood-based undersampling approach for handling imbalanced and overlapped data[J].Information Sciences,2020,509:47-70.
[21]BUNKHUMPORNPAT C,SINAPIROMSARAN K.DBMUTE:density-based majority under-sampling technique[J].Knowledge and Information Systems,2017,50:827-850.
[22]FU G H,WU Y J,ZONG M J,et al.Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics[J].Chemometrics and Intelligent Laboratory Systems,2020,196:103906.
[23]OMAR B,RUSTAM F,MEHMOOD A,et al.Minimizing theoverlapping degree to improve class-imbalanced learning under sparse feature selection:application to fraud detection[J].IEEE Access,2021,9:28101-28110.
[24]LI F,WANG B,SHEN Y,et al.An overlapping oriented imba-lanced ensemble learning algorithm withweighted projection clustering grouping and consistent fuzzy sample transformation[J].Information Sciences,2023,637:118955.
[25]JIANG H X,JIANG J Y,LIANG X.Review on Fraud Detection of Credit Card Transactions Based on Machine Learning[J/OL].Computer Engineering and Applications:1-29.[2023-06-03].http://kns.cnki.net/kcms/detail/11.2127.tp.20230424.1411.014.html.
[26]XUAN S,LIU G,LI Z,et al.Random forest for credit card fraud detection[C]//2018 IEEE 15th International Conference on Networking,Sensing and Control(ICNSC).IEEE,2018:1-6.
[27]MENG C,ZHOU L,LIU B.A case study in credit fraud detection with SMOTE and XGboost[C]//Journal of Physics:Conference Series.IOP Publishing,2020:052016.
[28]FU K,CHENG D,TU Y,et al.Credit card fraud detection using convolutional neural networks[C]//23rd International Confe-rence Neural Information Processing:(ICONIP 2016)Kyoto,Japan,Part III 23.Springer International Publishing,2016:483-490.
[29]BAHNSEN A C,AOUADA D,STOJANOVIC A,et al.Feature engineering strategies for credit card fraud detection[J].Expert Systems with Applications,2016,51:134-142.
[30]CHEN J I Z,LAI K L.Deep convolution neural network model for credit-card fraud detection and alert[J].Journal of Artificial Intelligence,2021,3(2):101-112.
[31]CARCILLO F,LE BORGNE Y A,CAELEN O,et al.Combining unsupervised and supervised learning in credit card fraud detection[J].Information Sciences,2021,557:317-331.
[32]ARJOVSKY M,CHINTALA S,BOTTOUL.Wasserstein GAN[OL].https://proceedings.mlr.press/v70/arjovsky17a.html.
[33]LIN Z,KHETAN A,FANTI G,et al.Pacgan:The power of two samples in generative adversarial networks[J/OL].Advances in Neural Information Processing Systems,2018,31.https://xplorestaging.ieee.org/document/9046238.
[34]SANTURKAR S,TSIPRAS D,ILYAS A,et al.How does batch normalization help optimization?[J/OL].Advances in Neural Information Processing Systems,2018,31.https://proceedings.neurips.cc/paper/2018/hash/905056c1ac1dad141560467e0a99-e1cf-Abstract.html.
[35]HUIJBEN I A M,KOOL W,PAULUS M B,et al.A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):1353-1371.
[36]CHENG G,PEDDINTI V,POVEY D,et al.An Exploration of Dropout with LSTMs[C]//Interspeech.2017:1586-1590.
[1] ZHANG Le, YU Ying, GE Hao. Mural Inpainting Based on Fast Fourier Convolution and Feature Pruning Coordinate Attention [J]. Computer Science, 2024, 51(6A): 230400083-9.
[2] LIANG Meiyan, FAN Yingying, WANG Lin. Fine-grained Colon Pathology Images Classification Based on Heterogeneous Ensemble Learningwith Multi-distance Measures [J]. Computer Science, 2024, 51(6A): 230400043-7.
[3] LI Xinrui, ZHANG Yanfang, KANG Xiaodong, LI Bo, HAN Junling. Intelligent Diagnosis of Brain Tumor with MRI Based on Ensemble Learning [J]. Computer Science, 2024, 51(6A): 230600043-7.
[4] XING Kaiyan, CHEN Wen. Multi-generator Active Learning Algorithm Based on Reverse Label Propagation and ItsApplication in Outlier Detection [J]. Computer Science, 2024, 51(4): 359-365.
[5] KANG Wei, LI Lihui, WEN Yimin. Semi-supervised Classification of Data Stream with Concept Drift Based on Clustering Model Reuse [J]. Computer Science, 2024, 51(4): 124-131.
[6] ZHANG Guodong, CHEN Zhihua, SHENG Bin. Infrared Small Target Detection Based on Dilated Convolutional Conditional GenerativeAdversarial Networks [J]. Computer Science, 2024, 51(2): 151-160.
[7] HU Binhao, ZHANG Jianpeng, CHEN Hongchang. Knowledge Graph Completion Algorithm Based on Generative Adversarial Network and Positiveand Unlabeled Learning [J]. Computer Science, 2024, 51(1): 310-315.
[8] SUN Shukui, FAN Jing, SUN Zhongqing, QU Jinshuai, DAI Tingting. Survey of Image Data Augmentation Techniques Based on Deep Learning [J]. Computer Science, 2024, 51(1): 150-167.
[9] WU Guibin, YANG Zongyuan, XIONG Yongping, ZHANG Xing, WANG Wei. Seal Removal Based on Generative Adversarial Gated Convolutional Network [J]. Computer Science, 2024, 51(1): 198-206.
[10] ZHUANG Yuan, CAO Wenfang, SUN Guokai, SUN Jianguo, SHEN Linshan, YOU Yang, WANG Xiaopeng, ZHANG Yunhai. Network Protocol Vulnerability Mining Method Based on the Combination of Generative AdversarialNetwork and Mutation Strategy [J]. Computer Science, 2023, 50(9): 44-51.
[11] YAN Yan, SUI Yi, SI Jianwei. Remote Sensing Image Pan-sharpening Method Based on Generative Adversarial Network [J]. Computer Science, 2023, 50(8): 133-141.
[12] ZHANG Desheng, CHEN Bo, ZHANG Jianhui, BU Youjun, SUN Chongxin, SUN Jia. Browser Fingerprint Recognition Based on Improved Self-paced Ensemble Algorithm [J]. Computer Science, 2023, 50(7): 317-324.
[13] WANG Jinwei, ZENG Kehui, ZHANG Jiawei, LUO Xiangyang, MA Bin. GAN-generated Face Detection Based on Space-Frequency Convolutional Neural Network [J]. Computer Science, 2023, 50(6): 216-224.
[14] YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
[15] HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!