计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210800237-6.doi: 10.11896/jsjkx.210800237

• 信息安全 • 上一篇    下一篇

基于启发式搜索特征选择的加密流量恶意行为检测技术

俞赛赛1, 王小娟2, 章倩倩3   

  1. 1 中国电子科技集团共识第三十研究所 成都 610096
    2 北京邮电大学电子工程学院 北京 100089
    3 海军士官学校图书馆 安徽 蚌埠 233040
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 王小娟(wj2718@163.com)
  • 作者简介:(734641272@qq.com)

Detection of Malicious Behavior in Encrypted Traffic Based on Heuristic Search Feature Selection

YU Sai-sai1, WANG Xiao-juan2, ZHANG Qian-qian3   

  1. 1 Consensus 30 Research Institute of China Electronics Technology Group,Chengdu 610096,China
    2 School of Electronic Engineering,Beijing University of Posts and Telecommunications,Beijing 100089,China
    3 Naval Academy Library,Bengbu,Anhui 233040,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:YU Sai-sai,born in 1982,Ph.D,senior engineer.His main research interest includes cyber security and so on.
    WANG Xiao-juan,Ph.D,associate professor.Her main research interests include cyber security,complex networks,deep learning and so on.

摘要: 随着加密流量在网络中的占比越来越大,隐藏在加密流量中的恶意行为也越来越多,网络安全威胁形势越来越严峻。具有某些恶意行为的加密流量包含有多种流量特征,其特征之间本身也存在一定的冗余性。冗余的特征会增加检测时间,降低模型检测的效率。文中依据启发式搜索策略原理对加密流量包含的多种不同的特征进行筛选,找出具有代表性的特征组合。首先根据随机森林算法对特征重要度进行排序,筛选出对分类结果影响较大的特征,然后利用Pearson相关系数计算所有特征之间的相似度,筛选出彼此之间较为独立的特征组合。在数据集CTU-13上的实验结果表明,通过筛选出具有代表性的特征组合,在不降低检测准确率的情况下,减少了检测时间,提高了对加密流量恶意行为的检测效率。

关键词: 加密流量, 恶意行为, 启发式搜索策略, 特征选择

Abstract: With the proportion of encrypted traffic in the network increasing,there are more and more malicious behaviors hidden in the encrypted traffic,which makes the situation of network security more and more serious.Encrypted traffic with some malicious behavior contains a variety of traffic characteristics,among which there is some redundancy.Redundant features will increase the detection time and reduce the efficiency of model detection.Based on the principle of heuristic search strategy,this paper selects many different features of encrypted traffic and finds out the representative combination of features.Firstly,the feature importance is sorted according to the random forest algorithm,and the features that have a great impact on the classification results are selected.Then,the similarity between all features is calculated by Pearson correlation coefficient,and the relatively independent feature combinations are selected.Experimental results on the data set CTU-13 show that,by screening representative feature combinations,detection time is reduced and the detection efficiency of encrypted traffic malicious behavior can be improved without decreasing the detection accuracy.

Key words: Encrypted traffic, Malicious behavior, Heuristic search strategy, Feature selection

中图分类号: 

  • TP309
[1]Cisco.2018 Annual Cybersecurity Report:The evolution of malware and rise of artificial intelligence[R/OL].(2018-02)[2019-07-22].https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2018/m02/cisco-2018-annual-cybersecurity-report-reveals-se-curi-ty-leaders-rely-on-and-invest-in-automation-machine-learning-and-artificial-intelligence-to-defen.html.
[2]ZHEN C Z.Research on encrypted traffic type identificationbased on DPI and machine learning[J].Information Communication,2018,31(4):258-260.
[3]WANG W,ZHU M,WANG J,et al.End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//2017 IEEE International Conference on Intelligence and Security Informatics(ISI).IEEE,2017:43-48.
[4]BAR-YANAI R,LANGBERG M,PELEG D,et al.Realtimeclassification for encrypted traffic[C]//International Sympo-sium on Experimental Algorithms.Berlin:Springer,2010:373-385.
[5]MSADEK N,SOUA R,ENGEL T.Iot device fingerprinting:Machine learning based encrypted traffic analysis[C]//2019 IEEE Wireless Communications and Networking Conference(WCNC).IEEE,2019:1-8.
[6]REZAEI S,LIU X.Deep learning for encrypted traffic classification:An overview[J].IEEE Communications Magazine,2019,57(5):76-81.
[7]CHENG L Y,YONG S,ZHI X.Android malicious behavior detection method based on reverse engineering[J].Information Security and Confidentiality of Communications,2015(4):83-87.
[8]BERLIN K,SLATER D,SAXE J.Malicious behavior detection using windows audit logs [C]//Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security.2015:35-44.
[9]YANG M,WANG S,LING Z,et al.Detection of malicious behavior in android apps through API calls and permission uses analysis[J].Concurrency and Computation:Practice and Experience,2017,29(19):e4172.1-e4172.13.
[10]Aqniu.一篇报告了解国内首个针对加密流量的检测引擎[EB/OL].(2019-3-15)[2019-7-22].https://www.aqniu.com/tools tech/45207.html.
[11]BIN H,HONG Z Z,HONG Y L,et al.TLS malicious traffic detection based on the combined characteristics of message payload and flow fingerprint.[J/OL].http://kns.cnki.net/kcms/detail/31.1289.TP.20191216.1035.003.html.
[12]LE T Y,MING H X,MIAO M.Analysis of the SSL protocol working process[J].Cybersecurity skills Surgery and Application,2017(7):36-38.
[13]ANDERSON B,MCGREW D.Identifying encrypted malwaretraffic with contextual flow data[C] //Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security.ACM,2016:35-46.
[14]FAN X Y.SSL/TLS protocol security research[D].Nanjing:Southeast University,2017.
[15]JING J,ZHI Z Y.Spark platform weighted hierarchical subspace randomized forest arithmetic research[J/OL].[2022-02-25].http://kns.cnki.net/kcms/detail/42.1671.TP.20191122.1607.022.html.
[16]WU Y L,KE Y T,CHEN Z,et al.Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping[J].Catena,2022,187:104396.
[17]MALIK J,KAUSHAL R.CREDROID:Android malware detection by network traffic analysis[C]//Workshop on Privacy Aware Mobile Computing.New York:ACM,2016.
[18]XU Y W.Research on HTTPS tunnel traffic detection technology based on fingerprint and statistical characteristics[D].Xi’an:Xidian University,2019.
[19]FENG D C,LIU Z T,WANG X D,et al.Machine learning-based compressive strength prediction for concrete:An adaptive boosting approach[J].Construction and Building Materials,2022,230:117000.
[20]XUAN Z Z.Research on mobile traffic recognition and anomaly detection based on machine learning[D]Chengdu:University of Electronic Science and Technology of China,2019.
[21]DREGER H,FELDMANN A.Dynamic application-layer protocol analysis for network intrusion detection[C]//Proceedings of the 15th USENIX Security Symposium.2006.
[22]BASET S,SCHULZ RINNE H.An analysis of the Skype peer-to-peer internet telephony protocol[C]//25th IEEE International Conference on Computer Communications,ser(INFOCOM2006).IEEE,2006.
[23]LONG M R.Research and Implementation of Unknown and Encrypted Traffic Recognition Based on Convolutional Neural Network[D].Beijing:Beijing University of Posts and Telecommunications,2018.
[24]GOODFELLOWI,BENGIO Y,COURVILLE A.Deep learning[M].Massachusetts:MIT Press,2016.
[25]PAN W,QIAO C X.Encrypted traffic identification methodbased on stacked autoencoder[J].Computer Engineering,2018,44(11):140-147.
[26]VOLKAN S,OMER K,MERIH G.A Bayesian network model for prediction and analysis of possible forest fire causes[J].Forest Ecology and Management,2022,457:117723.
[27]JIE Q C,QIANG G.A feature selection method based on FGScore[J].Journal of Yibin University,2018,18(6):4-8.
[28]SONG J G.Prediction of RNA spatial structure based on heuristic search strategy[D].Tianjin:Tianjin Polytechnic University,2019.
[29]KAI L.Research on adaptive feature selection and parameter optimization algorithm of stochastic forest[D].Changchun:Changchun University of Technology,2018.
[30]LI W X,GANG S,WEN Y X,et al.Correlation study of computer science and technology professional curriculum system based on Pearson coefficient[J].Wireless Internet Technology,2019,16(21):114-115.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[3] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[4] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[5] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[6] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[7] 王盼红, 朱昌明.
MIF-CNNIF:一种基于CNN的交叉特征的多分类图像数据框架
MIF-CNNIF:A Multi-classification Image Data Framework Based on CNN with Intersect Features
计算机科学, 2022, 49(11A): 210800267-8. https://doi.org/10.11896/jsjkx.210800267
[8] 李永红, 汪盈, 李腊全, 赵志强.
一种改进的特征选择算法在邮件过滤中的应用
Application of Improved Feature Selection Algorithm in Spam Filtering
计算机科学, 2022, 49(11A): 211000028-5. https://doi.org/10.11896/jsjkx.211000028
[9] 闫振超, 舒文豪, 谢昕.
动态部分标记混合数据的增量式特征选择算法
Incremental Feature Selection Algorithm for Dynamic Partially Labeled Hybrid Data
计算机科学, 2022, 49(11): 98-108. https://doi.org/10.11896/jsjkx.210900076
[10] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[11] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[12] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[13] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[14] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[15] 丁思凡, 王锋, 魏巍.
一种基于标签相关度的Relief特征选择算法
Relief Feature Selection Algorithm Based on Label Correlation
计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!