计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220400122-6.doi: 10.11896/jsjkx.220400122

• 信息安全 • 上一篇    下一篇

基于相似度的DGA域名检测方法

孙海栋1, 刘万平1, 黄东2   

  1. 1 重庆理工大学计算机科学与工程学院 重庆 400054;
    2 贵州大学现代制造技术教育部重点实验室 贵阳 550025
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 刘万平(wpliu@cqut.edu.cn)
  • 作者简介:(635675411@qq.com)
  • 基金资助:
    重庆市自然科学基金(cstc2021jcyj-msxmX0594);重庆市教委科学技术研究项目(KJQN201901101)

DGA Domain Name Detection Method Based on Similarity

SUN Haidong1, LIU Wanping1, HUANG Dong2   

  1. 1 College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China;
    2 Key Laboratory of Advanced Manufacturing Technology of the Ministry of Education,Guizhou University,Guiyang 550025,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:SUN Haidong,born in 1997,postgra-duate.His main research interests include cyber security and domain name detection. LIU Wanping,born in 1986,Ph.D,associate professor,master supervisor,is a member of China Computer Federation.His main research interests include network and information security.
  • Supported by:
    Natural Science Foundation of Chongqing,China(cstc2021jcyj-msxmX0594) and Science and Technology Research Project of Chongqing Education Commission(KJQN201901101).

摘要: 僵尸网络使互联网面临着巨大的威胁。依托僵尸网络的分布式拒绝服务攻击和垃圾邮件等恶意行为能给攻击目标造成巨大损失,其通信主要基于DGA域名,因此需要对域名进行检测。现有检测方法主要基于字符编码提取域名特征,再利用神经网络进行分类。由于仅考虑了字符特征,因此对DGA域名检测的准确率往往不高。为准确检测出DGA域名,提出了域名字符相似度和域名节点相似度的计算方法,并依据相似度对DGA域名进行检测。首先构建以双向门控循环单元神经网络为基学习器的模型,从数据集中筛选出具有明显特征的DGA域名;然后,使用循环神经网络对被筛选出的DGA域名进行聚类;最后,计算数据集中待检测域名与DGA域名的相似度,将相似度大于阈值的域名分类为DGA域名。实验结果表明,该方法在检测含多类DGA域名的数据集时准确率可达到99.03%。

关键词: DGA 域名, 僵尸网络, 域名检测, 相似度计算, 门控循环单元

Abstract: Botnets expose the Internet to a huge threat.Malicious behaviors such as distributed denial of service attacks and spam relying on botnets can cause great losses to the attack targets.The communication of the botnet is mainly based on the DGA domain name,so the domain name needs to be detected.Existing detection methods are mainly based on character encoding to extract domain name features,and then use neural networks for classification.Since only character features are considered,the detection accuracy of malicious domain names is often not high.In order to accurately detect DGA domain names,a calculation method of domain name character similarity and domain name node similarity is proposed,and malicious domain names are detected according to the similarity.First,a model based on a bidirectional gated recurrent unit neural network is constructed to screen out the algorithm with obvious features in the data set to generate domain names.Then using the recurrent neural network to cluster the selected malicious domain names,and finally calculate the similarity between the domain name to be detected in the dataset and the domain names which are malicious,and classify the domain name with the similarity greater than the threshold as the malicious domain name.Experimental results show that the method has an accuracy of 99.03% in detecting datasets containing multi-category malicious names.

Key words: DGA domain name, Botnet, Domain name detection, Similarity calculation, Gated recurrent unit

中图分类号: 

  • TP393
[1]JIANG J,ZHU G J W,DUAN H X,et al.Botnet mechanism and defense technology[J].Journal of Software,2012,23(1):82-96.
[2]LIU W,ZHONG S.Web malware spread modelling and optimal control strategies[R].Scientific Reports,7,2017.
[3]THOMAS N,PAUL K,SHEREEN F.A machine learning approach for detecting fast flux phishing hostnames[J].Journal of Information Security and Applications,2022,65:103-125.
[4]JEFFREY S,JEMAN P,JOONGHEON K et al.Proactive detection of algorithmically generated malicious domains[C]//2018 International Conference on Information Networking.2018:5-12.
[5]SEUNGWON S,GUO G.Conficker and beyond:A large-scale empirical study[C]//26th Annual Computer Security Applications Conference.2010:676-690.
[6]HUANG C,HAO S,INVERNIZZI L,et al.Gossip:Automatically Identifying Malicious Domains from Mailing List Discussions[C]//Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security.2017:494-505.
[7]JIANG Y,JIA M,ZHANG B,et al.Malicious Domain Name Detection Model Based on CNN-GRU-Attention[C]//2021 33rd Chinese Control and Decision Conference(CCDC).2021:1602-1607.
[8]ZHAO C,ZHANG Y,WANG Y.A Feature Ensemble-basedApproach to Malicious Domain Name Identification from Valid DNS Responses[C]//2020 International Joint Conference on Neural Networks(IJCNN).2020:1-7.
[9]CHANG C,CAO J J,LV G J,et al.Ground truth discovery of text data based on Bi-GRU with attention mechanism[J].Chinese Journal of Information,2020,34(2):46-55.
[10]YU G X,ZHANG Y,CUI H J,et al.Machine Learning based Design and Implementation of DGA Domain Name Detection System for Zombie Network[J].Journal of Information Security,2020,5(3):35-47.
[11]TAX D,DUIN R.Support Vector Data Description[J].Machine Learning,2004,54(1):45-66.
[12]LEYLA B,SEVI L.Exposure:A Passive DNS Analysis Service to Detect and Report Malicious Domains[J].ACM Transactions on Information and System Security(TISSEC),2014,16(4):1-28.
[13]PALANIAPPAN G,SANGEETHA S,RAJENDRAN B,et al.Malicious domain detection using machine learning on domain name features,host-based features and web-based features[J].Procedia Computer Science,2020,171:654-661.
[14]HE W,GOU G,KANG C,et al.Malicious domain detection via domain relationship and graph models[C]//2019 IEEE 38th International Performance Computing and Communications Conference(IPCCC).2019:1-8.
[15]ZANG X D,GONG J,HU X Y.Malicious domain name detection Based on AGD[J].Journal of Communications,2018,39(7):15-25.
[16]ZHANG S,ZHOU Z,LI D,et al.Attributed HeterogeneousGraph Neural Network for Malicious Domain Detection[C]//2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design(CSCWD).2021:397-403.
[17]LIANG Z,ZANG T,ZENG Y.MalPortrait:Sketch MaliciousDomain Portraits Based on Passive DNS Data[C]//2020 IEEE Wireless Communications and Networking Conference(WCNC).2020:1-8.
[18]SUN X,TONG M,YANG J,et al.HinDom:A Robust Malicious Domain Detection System based on Heterogeneous Information Network with Transductive Classification[C]//22nd International Symposium on Research in Attacks,Intrusions and Defenses.2019:399-412.
[19]SUN Y Z,YU Y T,HAN J W.Ranking-based Clustering of Heterogeneous Information Networks with Star Network Schema[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:797-806.
[20]SUN Y Z,HAN J W,YAN X F,et al.PathSim:Meta PathBased Top-K Similarity Search in Heterogeneous Information Networks[J].Proceedings of the VLDB Endowment,2011,4(11):992-1003.
[21]CUCCHIARELLI A,MORBIDONI C,SPALAZZI L,et al.Algorithmically Generated Malicious Domain Names Detection Based on n-Gram Features[J].Expert Systems with Applications,2021,170:114551.
[22]HWANG C,KIM H,LEE H,et al.Effective DGA-Domain Detection and Classification with Text-CNN and Additional Features[J].Electronics,2020,9(7):1070-1087.
[23]YANG L,LIU G,DAI Y,et al.Detecting Stealthy Domain Ge-neration Algorithms Using Heterogeneous Deep Neural Network Framework[J].IEEE Access,2020:82876-82889.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!