基于人工特征与深度特征的DGA域名检测算法

doi:10.11896/jsjkx.191000118

摘要/Abstract

摘要： 当前,各种各样的恶意软件常使用域名生成算法(Domain Generation Algorithms,DGAs)来生成大量的随机域名,然后尝试与C&C服务器建立通信,发动相应的攻击。现有的检测方法基于DGA域名的随机性构建人工特征,利用机器学习方法学习分类模式,但该类算法存在人工构建特征费时费力、检测误报率高等问题;或利用LSTM,GRU等深度学习技术学习DGA域名的序列关系,但该类算法对低随机性的DGA域名的检测准确率较低。文中提出了一种域名通用特征的提取方案,建立了包含41种DGA域名家族的数据集,并设计了基于人工特征与深度特征的检测算法,提高了模型的泛化能力,增加了对DGA域名家族的识别种类。实验结果表明,基于人工特征与深度特征的DGA域名检测算法取得了比传统深度学习方法更高的准确率和更好的泛化能力。

关键词: 长短期记忆网络, 特征工程, 域名检测, 域名生成算法

Abstract: Nowadays,various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to C&C (Command and Control) servers,in order to launch corresponding attacks.There are two existing methods to detect DGA domains.On the one hand,it is a machine learning method based on the randomness of DGA domain name to construct artificial features.This kind of algorithm has the problems of time-consuming and laborious artificial feature engineering and high false alarm rate and so on.On the other hand,LSTM,GRU and other deep learning technologies are used to learn the sequence relationship of DGA domain names.This kind of algorithm has a low detection accuracy for DGA domain names with low randomness.Therefore,this paper proposes a domain name generic feature extraction scheme,establishes a data set containing 41 DGA domain name families,and designs a detection algorithm based on artificial features and depth features that enhances the generalization ability of the model and improves the identification types of DGA domain families.Experimental results show that DGA domain name detection algorithm based on artificial features and depth features has achieved higheraccuracy and better generalization ability than traditional deep learning methods.

Key words: Domain generation algorithms, Domain name detection, Feature engineering, Long short-term memory

中图分类号:

TP393.0

胡鹏程, 刁力力, 叶桦, 仰燕兰. 基于人工特征与深度特征的DGA域名检测算法[J]. 计算机科学, 2020, 47(9): 311-317. https://doi.org/10.11896/jsjkx.191000118

HU Peng-cheng, DIAO Li-li, YE Hua, YANG Yan-lan. DGA Domains Detection Based on Artificial and Depth Features[J]. Computer Science, 2020, 47(9): 311-317. https://doi.org/10.11896/jsjkx.191000118

参考文献

[1] KÜHRER M,ROSSOW C,HOLZ T.Paint it black:Evaluating the effectiveness of malware blacklists[C]//International Workshop on Recent Advances in Intrusion Detection.Cham:Sprin-ger,2014:1-21.
[2] ANTONAKAKIS M,PERDISCI R,NADJI Y,et al.FromThrow-Away Traffic to Bots:Detecting the Rise of DGA-Based Malware[C]//21th USENIX Security Symposium.2012.
[3] YADAV S,REDDY A K K,REDDY A L N,et al.Detecting Algorithmically Generated Malicious Domain Names[C]//Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement 2010.Melbourne,Australia,ACM,2010.
[4] KRISHNAN,TAYLOR T,MONROSE F,et al.Crossingthethreshold:Detecting network malfeasance via sequential hypothesis testing[C]//2013 43rd Annual IEEE/IFIP InternationalConference on Dependable Systems and Networks (DSN).IEEE Computer Society,2013.
[5] MOWBRAY M,HAGEN J.Finding Domain-Generation Algo-rithms by Looking at Length Distribution[C]//IEEE International Symposium on Software Reliability Engineering Workshops.IEEE,2014.
[6] WOODBRIDGE J,ANDERSON H S,AHUJA A,et al.Predicting domain generation algorithms with long short-term memory networks[J].arXiv:1611.00791,2016.
[7] LISON P,MAVROEIDIS V.Automatic detection of malware-generated domains with recurrent neural models[J].arXiv:1709.07102,2017.
[8] CHEN L H,CHEN H,FANG Y Q.Detecting Domain Genera-.tion Algorithm Based on Attention Mechanism.[J].Journal of east China University of Science and Technology (Natural Science Edition),2019(3).
[9] LIAO K,ZHAO Z,DOUPEA,et al.Behind closed doors:mea-surement and analysis of CryptoLocker ransoms in Bitcoin[C]//Electronic Crime Research.IEEE,2016.
[10] SULKOSWKI A J.Cyber-Extortion:Duties and Liabilities Related to the Elephant in the Server Room[J/OL].SSRN Electronic Journal.https://ssrn.com/abstract=955962.
[11] ATZENI A,DIAZ F,LOPEZ F,et al.The Rise of AndroidBanking Trojans[J].IEEE Potentials,2020,39(3):13-18.
[12] ALBANESIUSC.Ramnit computer worm compromises 45K facebook logins[J/OL].http://www.pcmag.com/article2/0.
[13] PLOHMANN D,YAKDAN K,KLATT M.A comprehsivemeasurement study of domain generatingmalware[C]//25th USENIX Security Symposium.Austin:Usenix,2016:263-278.
[14] Gibberish-Detector[OL].https://github.com/rrenaud/Gibberi-sh-Detector.
[15] DGA feature mining[OL].https://www.cnblogs.com/bonelee/p/7640055.html.
[16] LI H.Statistical learning methods [M].Beijing:Tsinghua University Press,2012.
[17] ROBINSON A J.An application of recurrent neural nets tophone probability estimation[J].IEEE Trans.on Neural Networks,1994,5(2):298-305.
[18] BENGIO Y,BOULANGER-LEWANDOWSKI N,PASCANUR.Advances in optimizing recurrent networks[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013.
[19] GRAVES A.Long Short-Term Memory[M]//Supervised Sequence Labelling with Recurrent Neural Networks.2012.
[20] GERS F A,SCHRAUDOLPHN N,SCHMIDHUBER.Learning Precise Timing with LSTM Recurrent Networks[J].Journal of Machine Learning Research,2003,3(1):115-143.

相关文章 15

[1]	王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[2]	赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[3]	康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
[4]	王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[5]	高堰泸, 徐圆, 朱群雄. 基于A-DLSTM夹层网络结构的电能消耗预测方法 Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM 计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006
[6]	刘嘉琛, 秦小麟, 朱润泽. 基于LSTM-Attention的RFID移动对象位置预测 Prediction of RFID Mobile Object Location Based on LSTM-Attention 计算机科学, 2021, 48(3): 188-195. https://doi.org/10.11896/jsjkx.200600134
[7]	刘奇, 陈红梅, 罗川. 基于改进的蝗虫优化算法的红细胞供应预测方法 Method for Prediction of Red Blood Cells Supply Based on Improved Grasshopper Optimization Algorithm 计算机科学, 2021, 48(2): 224-230. https://doi.org/10.11896/jsjkx.200600016
[8]	彭斌, 李征, 刘勇, 吴永豪. 基于卷积神经网络的代码注释自动生成方法 Automatic Code Comments Generation Method Based on Convolutional Neural Network 计算机科学, 2021, 48(12): 117-124. https://doi.org/10.11896/jsjkx.201100090
[9]	景丽, 何婷婷. 基于改进TF-IDF和ABLCNN的中文文本分类模型 Chinese Text Classification Model Based on Improved TF-IDF and ABLCNN 计算机科学, 2021, 48(11A): 170-175. https://doi.org/10.11896/jsjkx.210100232
[10]	赵佳琦, 王瀚正, 周勇, 张迪, 周子渊. 基于多尺度与注意力特征增强的遥感图像描述生成方法 Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement 计算机科学, 2021, 48(1): 190-196. https://doi.org/10.11896/jsjkx.200600076
[11]	张玉帅, 赵欢, 李博. 基于BERT和BiLSTM的语义槽填充 Semantic Slot Filling Based on BERT and BiLSTM 计算机科学, 2021, 48(1): 247-252. https://doi.org/10.11896/jsjkx.191200088
[12]	崔彤彤, 王桂玲, 高晶. 基于1DCNN-LSTM的船舶轨迹分类方法 Ship Trajectory Classification Method Based on 1DCNN-LSTM 计算机科学, 2020, 47(9): 175-184. https://doi.org/10.11896/jsjkx.191000162
[13]	陈晋音, 蒋焘, 郑海斌. 基于信噪比分级的信号调制类型识别 Radio Modulation Recognition Based on Signal-noise Ratio Classification 计算机科学, 2020, 47(6A): 310-317. https://doi.org/10.11896/JsJkx.190800073
[14]	吕泽宇李纪旋陈如剑陈东明. 电商平台用户再购物行为的预测研究 Research on Prediction of Re-shopping Behavior of E-commerce Customers 计算机科学, 2020, 47(6A): 424-428. https://doi.org/10.11896/JsJkx.190900018
[15]	吕亿林, 田宏韬, 高建伟, 万怀宇. 结合百科知识与句子语义特征的关系抽取方法 Relation Extraction Method Combining Encyclopedia Knowledge and Sentence Semantic Features 计算机科学, 2020, 47(6A): 40-44. https://doi.org/10.11896/JsJkx.190700042

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed