计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 544-554.doi: 10.11896/jsjkx.210600131
庞兴龙, 朱国胜
PANG Xing-long, ZHU Guo-sheng
摘要: 半监督学习是一种新的机器学习方法,它将监督学习与无监督学习相结合,用少量的标签来分析大量的未标记数据集。近年来,半监督学习已成为国内外学者的研究热点之一,并被广泛应用于各个领域。随着5G等技术的兴起,网络流量数据流的复杂化、多样化给网络安全领域带来了新的挑战,因此,将半监督技术运用于网络流量数据的分析成为主要方法之一。现对当前网络流量数据特征以及处理方式进行介绍,阐述半监督学习在处理网络流量上的优势,总结了半监督学习在处理流量分析问题上的研究进展,并从半监督分类、半监督聚类和半监督降维等方面阐述了半监督学习在网络流量分析中的实际应用,最后指出了当前半监督网络流量分析方法在未来研究中面临的挑战和新的研究方向。
中图分类号:
[1] CHEN L,GONG J,XU X.Overview of application layer protocol recognition algorithms[J].Diss,2007,34(7):73-75. [2] MOORE A W,ZUEV D.Internet traffic classification usingbayesian analysis techniques[C]//Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems.2005:50-60. [3] YCAIDA.CAIDA data-overview of datasets,monitors and re-ports[EB/OL].[2019-04-14].http://www.Caida.org/data/overview/. [4] ESTE A,GRINGOLI F,SALGARELLI L.On-line SVM traffic classification[C]//2011 7th International Wireless Communications and Mobile Computing Conference.IEEE,2011:1778-1783. [5] DRAPER-GIL G,LASHKARI A H,MAMUN M S I,et al.Characterization of encrypted and vpn traffic using time-related[C]//Proceedings of the 2nd International Conference on Information Systems Security and Privacy(ICISSP).2016:407-414. [6] TAVALLAEE M,BAGHERI E,LU W,et al.A detailed analysis of the KDD CUP 99 data set[C]//2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.IEEE,2009:1-6. [7] SATO M,YAMAKI H,TAKAKURA H.Unknown attacks detection using feature extraction from anomaly-based ids alerts[C]//2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet.2012:273-277. [8] The MAWI Working Group.MAWl working group traffic ar-chive[EB/OL].[2019-04-14].http://mawi.wide.ad.jp/mawi/. [9] FEGER F,KOPRINSKA I.Co-training using RBF Nets and Different Feature Splits[C]//International Joint Conference on Neural Networks(IJCNN'06).IEEE,2006:1878-1885. [10] EL-DIN A S,GAYAR N E.New Feature Splitting Criteria for Co-training Using Genetic Algorit-hm Optimization[C]//Multiple Classifier Systems.International Workshop,Mcs,Cairo,Egypt,2010:22-32. [11] BASU S,BANERJEE A,MOONEY R.Semi-supervised clustering by seeding[C]//Proceedings of 19th International Confe-rence on Machine Learning(ICML 2002).2002:27-34. [12] WAGSTAFF K.Constrained K-means Clustering with Back-ground Knowledge[C]//Proceedings of ICML-2001.2001. [13] COHN D,CARUANA R,MCCALLUM A.Semi-supervisedclustering with user feedback[J].Constrained Clustering:Advances in Algorithms,Theory,and Applications,2003,4(1):17-32. [14] VIEGAS E K,SANTIN A O,COGO V V,et al.A reliable semi-supervised intrusion detection model:One year of network traffic anomalies[C]//2020 IEEE International Conference on Communications(ICC 2020).IEEE,2020. [15] WAGH S K,KOLHE S R.Effective semi-supervised approach towards intrusion detection system using machine learning techniques[J].International Journal of Electronic Security and DigitalForensics,2015,7(3):290-304. [16] WANG Y,CHEN C,XIANG Y.Unknown pattern extractionfor statistical network protocol identification[C]//2015 IEEE 40th Conference on Local Computer Networks(LCN).IEEE,2015:506-509. [17] ERMAN J,MAHANTI A,ARLITT M,et al.Offline/realtime traffic classification using semi-supervised learning[J].Perfor-mance Evaluation,2007,64(9/10/11/12):1194-1213. [18] ERMAN J,MAHANTI A,ARLITT M,et al.Semi-supervisednetwork traffic classification[C]//Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems.2007:369-370. [19] ZHANG J,CHEN C,XIANG Y,et al.Semi-supervised and compound classification of network traffic[J].International Journal of Security and Networks,2012,7(4):252-261. [20] LI L L,ZHANG X Y,ZHANG X,et al.Semi supervised traffic classification algorithm based on K-means and k-nearest neighbor[J].Journal of University of Information Engineering,2015(2):234-239. [21] LI P H,WANG Y,TAO X L.Semi supervised network traffic classification method of SVM[J].Computer Application,2013,33(6):1515-1518. [22] LI X,QI F,XU D,et al.An InternetTraff-ic Classification Method Based on Semi-Super-vised Support Vector Machine[C]//2011 IEEE International Conference on Communications(ICC).IEEE,2011:1-5. [23] NOORBEHBAHANI F,MANSOORI S.A new semi-supervised method for network traffic classification based on X-means clustering and label propagation[C]//2018 8th International Conference on Computer and Knowledge Engineering(ICCKE).IEEE,2018:120-125. [24] ILIYASU A S,DENG H.Semi-Supervised Encrypted TrafficClassification With Deep Convolutional Generative Adversarial Networks[J].IEEE Access,2020,8:118-126. [25] GLENNAN T,LECKIE C,ERFANI S M.Improved Classification of Known and Unknown Network Traffic Flows Using Semi-supervised Machine Learning[C]//Australasian Confe-rence on Information Security & Privacy.Springer International Publishing,2016:493-501. [26] FAHAD A,ALMALAWI A,TARI Z,et al.SemTra:A Semi-Supervised Approach to Traffic Flow Labeling with Minimal Human Effort[J].Pattern Recognition,2019,91:1-12. [27] LI T,CHEN S,YAO Z,et al.Semi-supervised network traffic classification using deep generative models[C]//2018 14th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery(ICNC-FSKD).2018:1282-1288. [28] SHI K.Research on Intrusion Detection Based on mutual information and semi supervised learning[J].Modern computer,2019(23):18-23. [29] XIAN G.Cyber Intrusion Prevention for Large-Scale Semi-Supervised Deep Learning Based on Local and Non-Local Regularization[J].IEEE Access,2020,8:55526-55539. [30] GAO Y,LIU Y,JIN Y,et al.A Novel Semi-Supervised Learning Approach for Network Intrusion Detection on Cloud-Based Robotic System[J].IEEE Access,2018,6:50927-50938. [31] FAHADA A.A Semi-Stack Approach for Accurate NetworkTraffic Classification Using MultiView Stacking[J].IOP Conference Series:Materials Science and Engineering,2020,811(1):012026. [32] GRIRA N,CRUCIANU M,BOUJEMAA N.Semi-SupervisedFuzzy Clustering with Pairwise-Constrained Competitive Agglomeration[C]//The 14th IEEE International Conference on Fuzzy Systems(FUZZ '05).IEEE,2005:867-872. [33] DING Y.Research on Intrusion Prevention Technology Basedon PCA and semi supervised clustering [D].Zhenjiang:Jiangsu University of Science and Technology,2014. [34] WANG Y,XIANG Y,ZHANG J,et al.Internet traffic clustering with side information[J].Journal of Computer and System Sciences,2014,80(5):1021-1036. [35] LIN R Q,LI O,LI Q,et al.Identification method of unknown network protocol based on semi supervised clustering integration[J].Small Microcomputer System,2016(6):1234-1239. [36] AL-JARRAH O Y,AL-HAMMDI Y,YOOP D,et al.Semi-supervised multi-layered clustering model for intrusion detection[J].Digital Communications and Networks,2018,4(4):277-286. [37] GU Y,WANG Y,YANG Z,et al.Multiple-features-based semi-supervised clustering DDoS detection method[J].Mathematical Problemsin Engineering,2017,2017:1-10. [38] GU Y,LI K,GUO Z,et al.Semi-supervised K-means DDoS Detection Method Using Hybrid Feature Selection Algorithm[J].IEEE Access,2019,7:64351-64365. [39] YAO H,FU D,ZHANG P,et al.MSML:A Novel Multi-level Semi-supervised Machine Learning Framework for Intrusion Detection System[J].IEEE Internet of Things Journal,2018,6(2):1945-1959. [40] RATHROE S,PARK J H.Semi-supervised learning baseddistributed attack detectionframewo-rk for IoT[J].Applied Soft Computing,2018,72:79-89. [41] JIA W F,LI J,TONG B.Network intrusion detection method based on semi supervised dimension reduction technology[J].Computer Applications and Software,2013(10):133-135. [42] XIANG Z,XIAO Z,HUANG Y,et al.Unsupervised and Semi-supervised Dimensionality Reduction with Self-Organizing Incremental Neural Network and Graph Similarity Constraints[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2016: 191-202. |
[1] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[2] | 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034 |
[3] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[4] | 许华杰, 陈育, 杨洋, 秦远卓. 基于混合样本自动数据增强技术的半监督学习方法 Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques 计算机科学, 2022, 49(3): 288-293. https://doi.org/10.11896/jsjkx.210100156 |
[5] | 王省, 康昭. 基于光滑表示的半监督分类算法 Smooth Representation-based Semi-supervised Classification 计算机科学, 2021, 48(3): 124-129. https://doi.org/10.11896/jsjkx.200700078 |
[6] | 储杰, 张正军, 汤鑫瑶, 黄振生. 基于加权样本和共识率的标记传播算法 Label Propagation Algorithm Based on Weighted Samples and Consensus-rate 计算机科学, 2021, 48(3): 214-219. https://doi.org/10.11896/jsjkx.191200103 |
[7] | 郭崎, 崔竞松. 一种基于闭源流媒体的隐蔽通讯方法 Covert Communication Method Based on Closed Source Streaming Media 计算机科学, 2019, 46(9): 150-155. https://doi.org/10.11896/j.issn.1002-137X.2019.09.021 |
[8] | 吴振宇, 李云雷, 吴凡. 基于Tucker分解的半监督支持张量机 Semi-supervised Support Tensor Based on Tucker Decomposition 计算机科学, 2019, 46(9): 195-200. https://doi.org/10.11896/j.issn.1002-137X.2019.09.028 |
[9] | 秦悦, 丁世飞. 半监督聚类综述 Survey of Semi-supervised Clustering 计算机科学, 2019, 46(9): 15-21. https://doi.org/10.11896/j.issn.1002-137X.2019.09.002 |
[10] | 沈鸿, 刘军发, 陈益强, 蒋鑫龙, 黄正宇. 基于多模融合的半监督场景识别方法 Semi-supervised Scene Recognition Method Based on Multi-mode Fusion 计算机科学, 2019, 46(12): 306-312. https://doi.org/10.11896/jsjkx.191200500C |
[11] | 喻影, 陈珂, 寿黎但, 陈刚, 吴晓凡. 基于关键词和关键句抽取的用户评论情感分析 Sentiment Analysis of User Comments Based on Extraction of Key Words and Key Sentences 计算机科学, 2019, 46(10): 19-26. https://doi.org/10.11896/jsjkx.191000531C |
[12] | 刘枭, 王晓国. 基于概率图的银行电信诈骗检测方法 Probabilistic Graphical Model Based Approach for Bank Telecommunication Fraud Detection 计算机科学, 2018, 45(7): 122-128. https://doi.org/10.11896/j.issn.1002-137X.2018.07.020 |
[13] | 成英超,王瑞胡,胡章平. 一种基于高斯混合模型的协同过滤算法 Novel Approach on Collaborative Filtering Based on Gaussian Mixture Model 计算机科学, 2017, 44(Z6): 451-454. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.101 |
[14] | 李锋,万小强. 基于关联矩阵的短信自动分类 SMS Automatic Classification Based on Relational Matrix 计算机科学, 2017, 44(Z6): 428-432. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.096 |
[15] | 蔡毅,朱秀芳,孙章丽,陈阿娇. 半监督集成学习综述 Semi-supervised and Ensemble Learning:A Review 计算机科学, 2017, 44(Z6): 7-13. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.002 |
|