Computer Science ›› 2018, Vol. 45 ›› Issue (6A): 371-374.

• Information Security • Previous Articles     Next Articles

Study on Click Fraud Detection in Online Advertising with Imbalanced Data Processing Methods

LI Xin, GUO Han,ZHANG Xin,HU Fang-qiang,SHUAI Ren-jun   

  1. College of Computer Science and Technology,Nanjing Tech University,Nanjing 211816,China
  • Online:2018-06-20 Published:2018-08-03

Abstract: Click fraud detection in online advertising is one of the most important applications of machine learning.Support vector machine (SVM) is a prominent supervised machine learning algorithm on classification problems with roughly equal distributions datasets.However,when applied to click fraud detection problems,the success of SVM is greatly limited due to the extreme imbalanced distribution of FDMA2012 competition dataset.In this paper,three data preprocess methods,random under-sample (RUS),synthetic minority over-sampling technique (SMOTE) and SMOTE+edited nearest neighbor(ENN),were detailed investigated,followed by SVM classifier to solve the question.Results show that the method combining SMOTE+ENN with SVM achieves accuracy about 95% on minority samples,which basically reaches the requirements of online advertising click fraud detection system.

Key words: Click fraud, Imbalanced, Mixed-sampling, SVM

CLC Number: 

  • TP393
[1]ZHANG S,SADAOUI S,MOUHOUB M.An Empirical Analysis of Imbalanced Data Classification[J].Computer & Information Science,2015,8(1):151-162.
[2]尹留志.关于非平衡数据特征问题的研究[D].合肥:中国科学技术大学,2014.
[3]JIAN C,GAO J,AO Y.A new sampling method for classifying imbalanced data based on support vector machine ensemble[J].Neurocomputing,2016,193(C):115-122.
[4]VAPNIK V N.The nature of statistical learning theory [M].New York:Springer Verlag,1995.
[5]崔建明.基于SVM算法的文本分类技术研究[J].计算机仿真,2013,30(2):299-302.
[6]董亚楠,刘学军,李斌.一种基于用户行为特征选择的点击欺诈检测方法[J].计算机科学,2016,43(10):145-149.
[7]OENTARYO R,LIM E P,FINEGOLD M,et al.Detecting click fraud in online advertising:a data mining approach [J].Journal of Machine Learning Research,2014,15(1):99-140.
[8]CHAWLA NV,BOWYER KW,HALL LO,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2011,16(1):321-357.
[9]GUSTAVO E A,BATISTA P A,RONALDO C,et al.A study of the behavior of several methods for balancing machine lear-ning training data[J].SIGKDD Explorations,2004,6(1):20-29.
[10]于化龙,高尚,赵靖,等.基于过采样技术和随机森林的不平衡微阵列数据分类方法研究[J].计算机科学,2012,39(5):190-194.
[1] LIU Wei-ming, AN Ran, MAO Yi-min. Parallel Support Vector Machine Algorithm Based on Clustering and WOA [J]. Computer Science, 2022, 49(7): 64-72.
[2] HU Cong, HE Xiao-hui, SHAO Fa-ming, ZHANG Yan-wu, LU Guan-lin, WANG Jin-kang. Traffic Sign Detection Based on MSERs and SVM [J]. Computer Science, 2022, 49(6A): 325-330.
[3] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[4] ZHOU Zhi-hao, CHEN Lei, WU Xiang, QIU Dong-liang, LIANG Guang-sheng, ZENG Fan-qiao. SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm [J]. Computer Science, 2022, 49(6A): 562-570.
[5] DONG Qi-da, WANG Zhe, WU Song-yang. Feature Fusion Framework Combining Attention Mechanism and Geometric Information [J]. Computer Science, 2022, 49(5): 129-134.
[6] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[7] WU Yu-kun, LI Wei, NI Min-ya, XU Zhi-cheng. Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder [J]. Computer Science, 2022, 49(3): 144-151.
[8] JIANG Hao-chen, WEI Zi-qi, LIU Lin, CHEN Jun. Imbalanced Data Classification:A Survey and Experiments in Medical Domain [J]. Computer Science, 2022, 49(1): 80-88.
[9] ZHANG Ren-jie, CHEN Wei, HANG Meng-xin, WU Li-fa. Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder [J]. Computer Science, 2021, 48(7): 62-69.
[10] CHEN Jing-jie, WANG Kun. Interval Prediction Method for Imbalanced Fuel Consumption Data [J]. Computer Science, 2021, 48(7): 178-183.
[11] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
[12] ZHANG Ren-zhi, ZHU Yan. Malicious User Detection Method for Social Network Based on Active Learning [J]. Computer Science, 2021, 48(6): 332-337.
[13] LI Meng-he, XU Hong-ji, SHI Lei-xin, ZHAO Wen-jie, LI Juan. Multi-person Activity Recognition Based on Bone Keypoints Detection [J]. Computer Science, 2021, 48(4): 138-143.
[14] SONG Yi-yan, TANG Dong-lin, WU Xu-long, ZHOU Li, QIN Bei-xuan. Study on Digital Tube Image Reading Combining Improved Threading Method with HOG+SVM Method [J]. Computer Science, 2021, 48(11A): 396-399.
[15] HUAN Wen-ming, LIN Hai-tao. Design of Intrusion Detection System Based on Sampling Ensemble Algorithm [J]. Computer Science, 2021, 48(11A): 705-712.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!