Computer Science ›› 2021, Vol. 48 ›› Issue (7): 62-69.doi: 10.11896/jsjkx.200600022

Special Issue: Artificial Intelligence Security

• Artificial Intelligence Security • Previous Articles     Next Articles

Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder

ZHANG Ren-jie, CHEN Wei, HANG Meng-xin, WU Li-fa   

  1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2020-06-02 Revised:2020-08-23 Online:2021-07-15 Published:2021-07-02
  • About author:ZHANG Ren-jie,born in 1995,M.S.candidate,is a student member of China Computer Federation.His main research interests include network security,machine learning.(
    CHEN Wei,born in 1979,Ph.D,professor,is a member of China Computer Federation.His main research interests include wireless network security,mobile Internet security.
  • Supported by:
    National Key Research and Development Project(2019YFB2101704).

Abstract: With the rapid development of machine learning technology,more and more machine learning algorithms are used to detect and analyze attack traffic.However,attack traffic often accounts for a very small portion of network traffic.When training machine learning models,there is often a problem of imbalance between the positive and negative samples of the training set,which affects model training effect.Aiming at the problem of imbalanced samples,an imbalanced sample generation method based on variational auto-encoder is proposed.The idea is that when expanding imbalanced samples,not all of them are expanded.But imbalanced samples are analyzed,and a small number of boundary samples that are most likely to have confusion effects on machine learning are expanded.First,the KNN algorithm is used to screen the samples that are closest to the majority of samples;second,DBSCAN algorithm is used to cluster the partial samples selected by the KNN algorithm to generate one or more sub-clusters;then,a VAE network model is designed to learn and expand the few samples in one or more sub-clusters distinguished by the DBSCAN algorithm.The expanded samples are added to the original samples to build a new training set;finally,the newly constructed training set is used to train decision tree classifier to detect abnormal traffic.The recall rate and F1 score are selected as the evaluation indicators.The original sample,the SMOTE-generated sample and our sample are compared.The experimental results show that the decision tree classifier trained using the proposed method in this paper has improved the recall rate and F1 score among the four types of anomalies.The F1 score is up to 20.9%,which is higher than the original sample and the SMOTE method.

Key words: Abnormal flow, DBSCAN, Imbalanced sample, KNN, Oversampling, Variational auto-encoder

CLC Number: 

  • TP391
[1]China Internet Network Information Center.The 44th statistical report on the development of Internet in China[J].Internet World,2019(10):74-91.
[2]ZHANG Y Q,ZHOU W,PENG A N.Overview of Internet of things security [J].Computer Research and Development,2017,54(10):2130-2143.
[3]GUI C N.Global Internet of things attacks increased by 280% in the first half of 2017[J].China Information Security,2017(9):10.
[4]ZHAO X.Design and implementation of network traffic detection system[D].Northeast Normal University,2011.
[5]ZHANG Y Q,DONG Y,LIU C Y,et al.Current situation,trend and Prospect of deep learning application in Cyberspace Security [J].Computer Research and Development,2018,55 (6):1117-1142.
[6]KANG S L,FAN X P,LIU L,et al.Research on P2P Botnets Detection Based on the ENN-ADASYN-SVM Classification Algorithm[J].Journal of Chinese Computer Systems,2016,37(2):216-220.
[7]MO Z,GAI Y R,FAN G L.Credit card fraud classification based on GAN-AdaBoost-DT imbalanced classification algorithm[J].Journal of Computer Applications,2019,39(2):618-622.
[8]KIM J H.Time Frequency Image and Artificial Neural Network Based Classification of Impact Noise for Machine Fault Diagnosis[J].International Journal of Precision Engineering and Manufacturing,2018,19(6):821-827.
[9]PUN J,LAWRYSHYN Y.Improving Credit Card Fraud Detection using a Meta-Classification Strategy[J].International Journal of Computer Applications,2012,56(10):41-46.
[10]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[11]HE H,BAI Y,GARCIA E A,et al.ADASYN:Adaptive Synthetic Sampling Approach for Imbalanced Learning[C]//IEEE International Joint Conference on Neural Networks(IJCNN 2008).IEEE,2008.
[12]HAN H,WANG W Y,MAO B H.Borderline-SMOTE:A New Over-Sampling Method in Imbalanced Data Sets Learning[C]//International Conference on Intelligent Computing.Berlin,Heidelberg:Springer,2005:878-887.
[13]ZHU T,LIN Y,LIU Y.Synthetic minority oversampling technique for multiclass imbalance problems[J].Pattern Recognition,2017,72:327-340.
[14]DOUZAS G,BACAO F.Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE[J].Information Sciences,2019,501:118-135.
[15]CASTRO C L,BRAGA A P.Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data[J].IEEE Transactions on Neural Networks and Learning Systems,2013,24(6):888-899.
[16]LI Y,LIU Z D,ZHANG H J.Overview of integrated classification algorithm for unbalanced data[J].Computer Application Research,2014,31(5):1287-1291.
[17]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.A Review on Ensembles for the Class Imbalance Problem:Bagging-,Boosting-,and Hybrid-Based Approaches[J].IEEE Transactions on Systems,Man and Cybernetics,Part C (Applications and Reviews),2012,42(4):463-484.
[18]SHI J R,MA Y Y.Research progress and development of deep learning[J].Computer Engineering and Application,2018,905(10):6-15.
[19]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114.2013.
[20]LIU F.Research on the theory and application of deep self en-coder [D].Wuxi:Jiangnan University,2018.
[21]MA H Q,MA S P,XU Y L,et al.Image denoising[J].Compu-ter Engineering and Application,2018,54(4):199-204,236.
[22]YIN B C,WANG W T,WANG L C.A review of deep learning research[J].Journal of Beijing University of Technology,2015 (1):48-59.
[23]ZENG X Y,YANG Y,WANG S Y,et al.A hybrid recommendation algorithm based on deep learning[J].Computer Science,2019,46(1):126-130.
[24]LIU S,HUANG Y,HU J,et al.Learning local responses of facial landmarks with conditional variational auto-encoder for face alignment[C]//2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).IEEE,2017:947-952.
[25]OSADA G,OMOTE K,NISHIDE T.Network intrusion detection based on semi-supervised variational auto-encoder[C]//European Symposium on Research in Computer Security.Cham:Springer,2017:344-361.
[26]ZHAI Z L,LIANG Z M,ZHOU W,et al.Review of variationalself encoder models[J].Computer Engineering and Application,2019,55(3):1-9.
[27]MOUSTAFA N,SLAY J.UNSW-NB15:a comprehensive dataset for network intrusion detection systems (UNSW-NB15 network data set)[C]//2015 Military Communications and Information Systems Conference (MilCIS).IEEE,2015.
[1] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[2] HUANG Ying-qi, CHEN Hong-mei. Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification [J]. Computer Science, 2021, 48(9): 77-85.
[3] ZHAO Zhi-qiang, YI Xiu-shuang, LI Jie, WANG Xing-wei. Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm [J]. Computer Science, 2021, 48(6A): 524-528.
[4] HUANG Ming, SUN Lin-fu, REN Chun-hua , WU Qi-shi. Improved KNN Time Series Analysis Method [J]. Computer Science, 2021, 48(6): 71-78.
[5] HE Miao-miao, GUO Wei-bin. Inductive Learning Algorithm of Graph Node Embedding Based on KNN and Matrix Transform [J]. Computer Science, 2021, 48(3): 201-205.
[6] OUYANG Peng, LU Lu, ZHANG Fan-long, QIU Shao-jian. Cross-project Clone Consistency Prediction via Transfer Learning and Oversampling Technology [J]. Computer Science, 2020, 47(9): 10-16.
[7] LUO Jin-nan and ZHANG Ji-min. Rail Area Extraction Using Extended Haar-like Features and DBSCAN Clustering [J]. Computer Science, 2020, 47(6A): 153-156.
[8] DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429.
[9] DONG Ming-gang,JIANG Zhen-long,JING Chao. Multi-class Imbalanced Learning Algorithm Based on Hellinger Distance and SMOTE Algorithm [J]. Computer Science, 2020, 47(1): 102-109.
[10] ZHANG Jian-xin, LIU Hong, LI Yan. Efficient Grouping Method for Crowd Evacuation [J]. Computer Science, 2019, 46(6): 231-238.
[11] XIA Ying, LI Liu-jie, ZHANG XU, BAE Hae-young. Weighted Oversampling Method Based on Hierarchical Clustering for Unbalanced Data [J]. Computer Science, 2019, 46(4): 22-27.
[12] ZHOU Xiao-min, CAO Fu-yuan, YU Li-qin. Bi-directional Oversampling Method Based on Sample Stratification [J]. Computer Science, 2019, 46(12): 83-88.
[13] HU Ying-shuang, LU Yi-hong. Cell Clustering Algorithm Based on MapReduce and Strongly Connected Fusion [J]. Computer Science, 2019, 46(11A): 204-207.
[14] BAO Zong-ming, GONG Sheng-rong, ZHONG Shan, YAN Ran, DAI Xing-hua. Person Re-identification Algorithm Based on Bidirectional KNN Ranking Optimization [J]. Computer Science, 2019, 46(11): 267-271.
[15] WU Jian-wei, LI Yan-ling, ZHANG Hui, ZANG Han-lin. HMM Cooperative Spectrum Prediction Algorithm Based on Density Clustering [J]. Computer Science, 2018, 45(9): 129-134.
Full text



No Suggested Reading articles found!