计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 311-317.doi: 10.11896/jsjkx.191000118

• 信息安全 • 上一篇    下一篇

基于人工特征与深度特征的DGA域名检测算法

胡鹏程1, 刁力力2, 叶桦1, 仰燕兰1   

  1. 1 东南大学自动化学院 南京210096
    2 趋势科技核心技术部 南京210012
  • 收稿日期:2019-10-18 发布日期:2020-09-10
  • 通讯作者: 叶桦(zhineng@seu.edu.cn)
  • 作者简介:pengchenghu@seu.edu.cn

DGA Domains Detection Based on Artificial and Depth Features

HU Peng-cheng1, DIAO Li-li2, YE Hua1, YANG Yan-lan1   

  1. 1 School of Automation,Southeast University,Nanjing 210096,China
    2 Core Technology-Research,Trend Micro China Development Center,Nanjing 210012,China
  • Received:2019-10-18 Published:2020-09-10
  • About author:HU Peng-cheng,born in 1996,postgra-duate.His main research interests include pattern recognition and intelligent system,data mining,deep learning,network security,etc.
    YE Hua,born in 1961,Ph.D.His main research interests include intelligent control,pattern recognition,computer application,intelligent building,intelligent robot,fault diagnosis,etc.

摘要: 当前,各种各样的恶意软件常使用域名生成算法(Domain Generation Algorithms,DGAs)来生成大量的随机域名,然后尝试与C&C服务器建立通信,发动相应的攻击。现有的检测方法基于DGA域名的随机性构建人工特征,利用机器学习方法学习分类模式,但该类算法存在人工构建特征费时费力、检测误报率高等问题;或利用LSTM,GRU等深度学习技术学习DGA域名的序列关系,但该类算法对低随机性的DGA域名的检测准确率较低。文中提出了一种域名通用特征的提取方案,建立了包含41种DGA域名家族的数据集,并设计了基于人工特征与深度特征的检测算法,提高了模型的泛化能力,增加了对DGA域名家族的识别种类。实验结果表明,基于人工特征与深度特征的DGA域名检测算法取得了比传统深度学习方法更高的准确率和更好的泛化能力。

关键词: 长短期记忆网络, 特征工程, 域名检测, 域名生成算法

Abstract: Nowadays,various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to C&C (Command and Control) servers,in order to launch corresponding attacks.There are two existing methods to detect DGA domains.On the one hand,it is a machine learning method based on the randomness of DGA domain name to construct artificial features.This kind of algorithm has the problems of time-consuming and laborious artificial feature engineering and high false alarm rate and so on.On the other hand,LSTM,GRU and other deep learning technologies are used to learn the sequence relationship of DGA domain names.This kind of algorithm has a low detection accuracy for DGA domain names with low randomness.Therefore,this paper proposes a domain name generic feature extraction scheme,establishes a data set containing 41 DGA domain name families,and designs a detection algorithm based on artificial features and depth features that enhances the generalization ability of the model and improves the identification types of DGA domain families.Experimental results show that DGA domain name detection algorithm based on artificial features and depth features has achieved higheraccuracy and better generalization ability than traditional deep learning methods.

Key words: Domain generation algorithms, Domain name detection, Feature engineering, Long short-term memory

中图分类号: 

  • TP393.0
[1] KÜHRER M,ROSSOW C,HOLZ T.Paint it black:Evaluating the effectiveness of malware blacklists[C]//International Workshop on Recent Advances in Intrusion Detection.Cham:Sprin-ger,2014:1-21.
[2] ANTONAKAKIS M,PERDISCI R,NADJI Y,et al.FromThrow-Away Traffic to Bots:Detecting the Rise of DGA-Based Malware[C]//21th USENIX Security Symposium.2012.
[3] YADAV S,REDDY A K K,REDDY A L N,et al.Detecting Algorithmically Generated Malicious Domain Names[C]//Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement 2010.Melbourne,Australia,ACM,2010.
[4] KRISHNAN,TAYLOR T,MONROSE F,et al.Crossingthethreshold:Detecting network malfeasance via sequential hypothesis testing[C]//2013 43rd Annual IEEE/IFIP InternationalConference on Dependable Systems and Networks (DSN).IEEE Computer Society,2013.
[5] MOWBRAY M,HAGEN J.Finding Domain-Generation Algo-rithms by Looking at Length Distribution[C]//IEEE International Symposium on Software Reliability Engineering Workshops.IEEE,2014.
[6] WOODBRIDGE J,ANDERSON H S,AHUJA A,et al.Predicting domain generation algorithms with long short-term memory networks[J].arXiv:1611.00791,2016.
[7] LISON P,MAVROEIDIS V.Automatic detection of malware-generated domains with recurrent neural models[J].arXiv:1709.07102,2017.
[8] CHEN L H,CHEN H,FANG Y Q.Detecting Domain Genera-.tion Algorithm Based on Attention Mechanism.[J].Journal of east China University of Science and Technology (Natural Science Edition),2019(3).
[9] LIAO K,ZHAO Z,DOUPEA,et al.Behind closed doors:mea-surement and analysis of CryptoLocker ransoms in Bitcoin[C]//Electronic Crime Research.IEEE,2016.
[10] SULKOSWKI A J.Cyber-Extortion:Duties and Liabilities Related to the Elephant in the Server Room[J/OL].SSRN Electronic Journal.https://ssrn.com/abstract=955962.
[11] ATZENI A,DIAZ F,LOPEZ F,et al.The Rise of AndroidBanking Trojans[J].IEEE Potentials,2020,39(3):13-18.
[12] ALBANESIUSC.Ramnit computer worm compromises 45K facebook logins[J/OL].http://www.pcmag.com/article2/0.
[13] PLOHMANN D,YAKDAN K,KLATT M.A comprehsivemeasurement study of domain generatingmalware[C]//25th USENIX Security Symposium.Austin:Usenix,2016:263-278.
[14] Gibberish-Detector[OL].https://github.com/rrenaud/Gibberi-sh-Detector.
[15] DGA feature mining[OL].https://www.cnblogs.com/bonelee/p/7640055.html.
[16] LI H.Statistical learning methods [M].Beijing:Tsinghua University Press,2012.
[17] ROBINSON A J.An application of recurrent neural nets tophone probability estimation[J].IEEE Trans.on Neural Networks,1994,5(2):298-305.
[18] BENGIO Y,BOULANGER-LEWANDOWSKI N,PASCANUR.Advances in optimizing recurrent networks[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013.
[19] GRAVES A.Long Short-Term Memory[M]//Supervised Sequence Labelling with Recurrent Neural Networks.2012.
[20] GERS F A,SCHRAUDOLPHN N,SCHMIDHUBER.Learning Precise Timing with LSTM Recurrent Networks[J].Journal of Machine Learning Research,2003,3(1):115-143.
[1] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[2] 赵冬梅, 吴亚星, 张红斌.
基于IPSO-BiLSTM的网络安全态势预测
Network Security Situation Prediction Based on IPSO-BiLSTM
计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[3] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
[4] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[5] 高堰泸, 徐圆, 朱群雄.
基于A-DLSTM夹层网络结构的电能消耗预测方法
Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM
计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006
[6] 刘嘉琛, 秦小麟, 朱润泽.
基于LSTM-Attention的RFID移动对象位置预测
Prediction of RFID Mobile Object Location Based on LSTM-Attention
计算机科学, 2021, 48(3): 188-195. https://doi.org/10.11896/jsjkx.200600134
[7] 刘奇, 陈红梅, 罗川.
基于改进的蝗虫优化算法的红细胞供应预测方法
Method for Prediction of Red Blood Cells Supply Based on Improved Grasshopper Optimization Algorithm
计算机科学, 2021, 48(2): 224-230. https://doi.org/10.11896/jsjkx.200600016
[8] 彭斌, 李征, 刘勇, 吴永豪.
基于卷积神经网络的代码注释自动生成方法
Automatic Code Comments Generation Method Based on Convolutional Neural Network
计算机科学, 2021, 48(12): 117-124. https://doi.org/10.11896/jsjkx.201100090
[9] 景丽, 何婷婷.
基于改进TF-IDF和ABLCNN的中文文本分类模型
Chinese Text Classification Model Based on Improved TF-IDF and ABLCNN
计算机科学, 2021, 48(11A): 170-175. https://doi.org/10.11896/jsjkx.210100232
[10] 赵佳琦, 王瀚正, 周勇, 张迪, 周子渊.
基于多尺度与注意力特征增强的遥感图像描述生成方法
Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement
计算机科学, 2021, 48(1): 190-196. https://doi.org/10.11896/jsjkx.200600076
[11] 张玉帅, 赵欢, 李博.
基于BERT和BiLSTM的语义槽填充
Semantic Slot Filling Based on BERT and BiLSTM
计算机科学, 2021, 48(1): 247-252. https://doi.org/10.11896/jsjkx.191200088
[12] 崔彤彤, 王桂玲, 高晶.
基于1DCNN-LSTM的船舶轨迹分类方法
Ship Trajectory Classification Method Based on 1DCNN-LSTM
计算机科学, 2020, 47(9): 175-184. https://doi.org/10.11896/jsjkx.191000162
[13] 陈晋音, 蒋焘, 郑海斌.
基于信噪比分级的信号调制类型识别
Radio Modulation Recognition Based on Signal-noise Ratio Classification
计算机科学, 2020, 47(6A): 310-317. https://doi.org/10.11896/JsJkx.190800073
[14] 吕泽宇李纪旋陈如剑陈东明.
电商平台用户再购物行为的预测研究
Research on Prediction of Re-shopping Behavior of E-commerce Customers
计算机科学, 2020, 47(6A): 424-428. https://doi.org/10.11896/JsJkx.190900018
[15] 吕亿林, 田宏韬, 高建伟, 万怀宇.
结合百科知识与句子语义特征的关系抽取方法
Relation Extraction Method Combining Encyclopedia Knowledge and Sentence Semantic Features
计算机科学, 2020, 47(6A): 40-44. https://doi.org/10.11896/JsJkx.190700042
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!