计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 251-259.doi: 10.11896/jsjkx.220700277

• 信息安全 • 上一篇    下一篇

基于字符特征的 DGA 域名检测方法研究综述

王宇1, 王祖朝1, 潘瑞2   

  1. 1 中国地质大学(北京)数理学院 北京 100083
    2 中国信息通信研究院 北京 100191
  • 收稿日期:2022-07-28 修回日期:2022-11-24 出版日期:2023-08-15 发布日期:2023-08-02
  • 通讯作者: 潘瑞(panrui@caict.ac.cn)
  • 作者简介:(18847163202@163.com)
  • 基金资助:
    国家自然科学基金(62071152)

Survey of DGA Domain Name Detection Based on Character Feature

WANG Yu1, WANG Zuchao1, PAN Rui2   

  1. 1 School of Science,China University of Geosciences(Beijing),Beijing 100083,China
    2 China Academy of Information and Communications Technology,Beijing 100191,China
  • Received:2022-07-28 Revised:2022-11-24 Online:2023-08-15 Published:2023-08-02
  • About author:WANG Yu,born in 1996,postgraduate,is a member of China Computer Federation.Her main research interests include data mining and deep learning in DGA domain name detection.
    PAN Rui,born in 1988,master,senior engineer.His main research interests include cyber security,data governance and data security.
  • Supported by:
    National Natural Science Foundation of China(62071152).

摘要: 利用域名生成算法(Domain Generation Algorithm,DGA)可以生成大量的随机域名,近年来僵尸网络普遍使用DGA域名来增强隐蔽性。高效的检测DGA域名,对发现僵尸网络和保障网络信息安全具有重要意义。基于字符特征的 DGA 域名检测指仅利用域名的字符串完成检测,是一种实时检测方法,也是近年来对DGA域名检测研究的热点。对此类方法进行研究发现,使用传统机器学习和深度学习算法能够有效地检测DGA域名。但是对基于单词表的DGA域名、长度较短的DGA域名和新型DGA域名,还需要通过改进词嵌入方式、引入注意力机制或加入对抗样本等方法,来提高检测能力。最后对基于字符特征的DGA域名检测方法进行总结,分析不同检测方法的优点和存在的问题,提出了未来的研究方向和研究中需要解决的关键问题。

关键词: 网络安全, DGA域名检测, 机器学习, 深度学习, 词嵌入, 注意力机制, 对抗样本

Abstract: Recent years have seen extensive adoption of domain generation algorithms(DGA) by botnets.Efficient detection of DGA domain name is of great significance for discovering botnets and ensuring cyber security.DGA domain name detection me-thod based on character feature can complete the detection only by using the domain name string.It is a real-time detection me-thod,and has become a hot spot in the research on DGA domain name detection.Research on such methods shows DGA domain name can be effectively detected by using traditional machine learning or deep learning.However,for wordlist-based DGA domain name,shorter-length DGA domain name,or new variant DGA domain name,it is still necessary to improve the detection ability by improving word embedding method,introducing attention mechanisms,or joining adversarial samples,etc.Finally,this paper summarizes the above methods,analyzes their advantages and existing problems,and proposes future research directions and key issues that need to be addressed for DGA domain name detection.

Key words: Cyber security, DGA domain name detection, Machine learning, Deep learning, Word embedding, Attention mechanism, Adversarial example

中图分类号: 

  • TP393.08
[1]NIU W N,JIANG T Y,ZHANG X S,et al.Fast-flux botnet detection method based on spatiotemporal feature of network traffic[J].Journal of Electronics & Information Technology,2020,42(8):1872-1880.
[2]ZOU F,TAN Y,WANG L,et al.Botnet detection based on ge-nerative adversarial network[J].Journal on Communications,2021,42(7):95-106.
[3]DEHKORDI M J,SADEGHIYAN B.Reconstruction of C&Cchannel for P2P botnet[J].IET Communications,2020,14(8):1318-1326.
[4]WANG Z,GUO Y.Neural networks based domain name genera-tion[J/OL].Journal of Information Security and Applications,2021,61:102948.https://doi.org/10.1016/j.jisa.2021.102948.
[5]PLOHMANN D,YAKDAN K,KLATT M A,et al.A comprehensive measurement study of domain generating malware[C]//25th USENIX Security Symposium.Austin,TX,USA:USENIX Association,2016:263-278.
[6]ALMASHHADANI A O,KAIIALI M,CARLIN D,et al.Maldom Detector:A system for detecting algorithmically generated domain names with machine learning[J/OL].Computers & Security,2020,93:101787.https://doi.org/10.1016/j.redox.2020.101787.
[7]BARABOSCH T,WICHMANN A,LEDER F,et al.Automatic extraction of domain name generation algorithms from current malware[C]//NATO Symposium IST-111 on Information Assurance and Cyber Defense.Koblenz,2012.
[8]YADAV S,REDDY A K K,REDDY A L,et al.Detecting algorithmically generated malicious domain names[C]//Proceedings of the 10th ACM SIGCOMM conference on Internet measurement.Melbourne,Australia,2010:48-61.
[9]YADAV S,REDDY K,REDDY N,et al.Detecting algorithmically generated domain-flux attacks with DNS traffic analysis[J].IEEE/ACM Transactions on Networking,2012,20(5):1663-1677.
[10]ANTONAKAKIS M,PERDISCI R,NADJI Y,et al.From{throw-away} traffic to bots:detecting the rise of {DGA-based} malware[C]//21st USENIX Security Symposium(USENIX Security 12).Bellevue,WA,2012:491-506.
[11]MAC H,TRAN D,TONG V,et al.DGA botnet detection using supervised learning methods[C]//Proceedings of the Eighth International Symposium on Information and Communication Technology.Nha Trang City,Viet Nam,2017:211-218.
[12]HUANG J,ZHANG G,SHEN Y.DGA domain name detection based on SVM under grey wolf optimization algorithm[C]//2019 IEEE 10th International Conference on Software Enginee-ring and Service Science(ICSESS).Newyork:IEEE Press,2019:245-248.
[13]LISON P,MAVROEIDIS V.Automatic detection of malware-generated domains with recurrent neural models[J].arXiv:1709.07102,2017.
[14]MU Z C.Predicting Domain generation algorithms with N-Gram models[C]//2022 International Conference on Big Data,Information and Computer Network(BDICN).Newyork:IEEE Press,2022:31-38.
[15]WANG H.Botnet detection via machine learning techniques[C]//2022 International Conference on Big Data,Information and Computer Network(BDICN).IEEE,2022:831-836.
[16]WOODBRIDGE J,ANDERSON H S,AHUJA A,et al.Predicting domain generation algorithms with long short-term memory networks[J].arXiv:1611.00791,2016.
[17]CHEN L G,ZHANG Y D,GENG G G,et al.Detection of random generated names using recurrent neural network with gated recurrent unit[J].Computer Systems & Applications,2018,27(8):198-202.
[18]SHAHZAD H,SATTAR A R,SKANDARANIYAM J.DGAdomain detection using deep learning[C]//2021 IEEE 5th International Conference on Cryptography,Security and Privacy(CSP).Newyork:IEEE Press,2021:139-143.
[19]TRAN D,MAC H,TONG V,et al.A LSTM based framework for handling multiclass imbalance in DGA botnet detection[J].Neurocomputing,2018,275:2401-2413.
[20]CHEN Y,PANG B,SHAO G,et al.DGA-based botnet detec-tion toward imbalanced multiclass learning[J].Tsinghua Science and Technology,2021,26(4):387-402.
[21]KIM Y.Convolutional neural networks for sentence classification[C]//The 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Doha,Qatar,2014:1746-1751.
[22]ZHANG X,ZHAO J,LECUN Y.Character-level convolutional networks for text classification[J/OL].Advances in Neural Information Processing Systems,2015,28.https://doi.org/10.48550/arXiv.1509.01626.
[23]SAXE J,BERLIN K.eXpose:A character-level convolutionalneural network with embeddings for detecting malicious URLs,file paths and registry keys[J].arXiv:1702.08568,2017.
[24]YU B,PAN J,HU J,et al.Character level based detection of DGA domain names[C]//2018 International Joint Conference on Neural Networks.Rio de Janeiro,Brazil,2018:1-8.
[25]ZHOU S,LIN L,YUAN J,et al.CNN-based DGA detection with high coverage[C]//2019 IEEE International Conference on Intelligence and Security Informatics(ISI).New York:IEEE Press,2019:62-67.
[26]YANG L H,LIU G J,ZHAI J T,et al.Improved algorithm for detection of the malicious domain name based on the convolutional neural network[J].Journal of Xidian University,2020,47(1):37-43.
[27]ZHOU C,SUN C,LIU Z,et al.A C-LSTM neural network for text classification[J].Expert Systems with Applications,ELSEVIER,2017,72:221-230.
[28]ZHANG B,LIAO R J.Malicious domain name detection model based on CNN and LSTM[J].Journal of Electronics & Information Technology,2021,43(10):2944-2951.
[29]XU G T,SHENG Z W.DGA malicious domain name detection method based on fusion of CNN and LSTM[J].Netinfo Security,2021,21(10):41-47.
[30]PEI L Z,ZHAO Y J,WANG Z,et al.Comparison of DGA Domain Detection Models Using Deep Learning[J].Computer Science,2019,46(5):111-115.
[31]BERMAN D S.DGA CapsNet:1D application of capsule networks to DGA detection[J].Information,2019,10(5):157.
[32]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//The 27th Advances in Neural Information Processing Systems.Stateline,USA,2013:3111-3119.
[33]PENNINGTON J,SOCHER R,MANNING C.Glove:globalvectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing.Doha.Qatar,2014:1532-1543.
[34]PETERS M E,NEUMANN M,IYYER M,et al.Deep contex-tualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.New Orleans,2018:2227-2237.
[35]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[36]HOWARD J,RUDER S.Universal language model fine-tuningfor text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Melbourne,Australia,2018:328-339.
[37]FU Y,YU L,HAMBOLU O,et al.Stealthy domain generation algorithms[J].IEEE Transactions on Information Forensics and Security,2017,12(6):1430-1443.
[38]KOH J J,RHODES B.Inline detection of domain generation algorithms with context-sensitive word embeddings[C]//2018 IEEE International Conference on Big Data(Big Data).New York:IEEE Press,2018:2966-2971.
[39]DU P,DING S F.A DGA domain name detection method based on deep learning models with mixed word embedding[J].Journal of Computer Research and Development,2020,57(2):433-446.
[40]HU P C,DIAO L L,YE H,et al.DGA domains detection based on artificial and depth features[J].Computer Science,2020,47(9):311-317.
[41]PAN R,CHEN J,MA H Y,et al.Using extended character feature in Bi-LSTM for DGA domain name detection[C]//2022 IEEE/ACIS 22nd International Conference on Computer and Information Science(ICIS).New York:IEEE Press,2022:115-118.
[42]YANG L,LIU G,LIU W,et al.Detecting multielement algo-rithmically generated domain names based on adaptive embedding model[J].Security and Communication Networks,2021,2021(6):1-20.
[43]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[44]QIAO Y,ZHANG B,ZHANG W,et al.DGA domain name classification method based on long short-term memory with attention mechanism[J].Applied Sciences,2019,9(20):4205.
[45]TUAN T A,LONG H V,TANIAR D.On Detecting and Classifying DGA Botnets and their Families[J/OL].Computers & Security,2022,113:102549.https://doi.org/10.1016/j.cose.2021.102549.
[46]ZHAO K,GUO W,QIN F,et al.D3-SACNN:DGA domain detection with self-Attention convolutional network[J].IEEE Access,2021,10:69250-69263.
[47]YANG L,LIU G,WANG J,et al.Fast3DS:A real-time full-convolutional malicious domain name detection system[J/OL].Journal of Information Security and Applications,2021,61:102933.https://doi.org/10.1016/j.jisa.2021.102933.
[48]REN F,JIANG Z,WANG X,et al.A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network[J].Cybersecurity,2020,3(1):1-13.
[49]YANG L H,LIU G J,DAI Y W,et al.Detecting stealthy domain generation algorithms using heterogeneous deep neural network framework[J].IEEE Access,2020,8:82876-82889.
[50]NAMGUNG J,SON S,MOON Y S.Efficient deep learningmodels for DGA domain detection[J].Security and Communication Networks,2021,2021(2):1-15.
[51]LIANG J,CHEN S,WEI Z,et al.HAGDetector:Heterogeneous DGA Domain Name Detection Model[J].Computers & Security,2022:102803.
[52]SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[J].arXiv:1312.6199,2013.
[53]ANDERSON H S,WOODBRIDGE J,FILAR B.DeepDGA:adversarially-tuned domain generation and detection[C]//Procee-dings of the 2016 ACM Workshop on Artificial Intelligence and Security.2016:13-21.
[54]PECK J,NIE C,SIVAGURU R,et al.CharBot:A simple and ef-fective method for evading DGA classifiers[J].IEEE Access,2019,7:91759-91771.
[55]LIU X Y,LIU J M,LIU C,et al.Novel botnet DGA domain detection method based on character level sliding window and deep residual network[J].Acta Electronica Sinica,2022,50(1):250-256.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!