计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 261-269.doi: 10.11896/jsjkx.220700076

• 计算机网络 • 上一篇    下一篇

基于深度学习的活跃IPv6地址预测算法

李育强1, 李林峰2, 朱浩1, 侯孟书1   

  1. 1 电子科技大学信息中心 成都 611731
    2 电子科技大学计算机科学与工程学院 成都 611731
  • 收稿日期:2022-07-07 修回日期:2022-11-08 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 李育强(yqli@uestc.edu.cn)
  • 基金资助:
    四川省科技计划重点研发项目(2022YFG0329)

Deep Learning-based Algorithm for Active IPv6 Address Prediction

LI Yuqiang1, LI Linfeng2, ZHU Hao1, HOU Mengshu1   

  1. 1 Information Center,University of Electronic Science and Technology of China,Chengdu 611731,China
    2 School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China
  • Received:2022-07-07 Revised:2022-11-08 Online:2023-07-15 Published:2023-07-05
  • About author:LI Yuqiang,born in 1979,master,lecturer.His main research interests include computer network and cyber security.
  • Supported by:
    Key Technologies Research and Development Program of Sichuan Science and Technology Plan(2022YFG0329).

摘要: 由于IPv6拥有庞大的地址空间,基于现有网络速度和硬件计算能力,难以实现全球IPv6地址扫描。通过地址生成算法来预测网络中可能出现的 IPv6 地址,随后将预测地址作为扫描的目标,可以达到 IPv6 地址快速扫描的目的。文中通过分析IPv6地址结构和分配方式来探索潜在的分配模式,结合已有的传统语言模型和目标生成算法,提出了一种基于深度学习的算法6LMNS,来预测潜在的活跃IPV6地址。6LMNS首先通过地址向量空间映射模型Add2vec来构建具有一定语义关系的 IPv6 地址词向量空间;随后基于Transformer构建语言训练模型GPT-IPv6,以此来估计IPv6地址词向量序列的概率分布;最后引入核心采样替代传统贪心搜索解码,完成活跃地址的生成。经验证,与其他语言模型和目标生成算法相比,6LMNS生成的地址拥有更好的多样性以及更高的活跃率。

关键词: 深度学习, Word2Vec, GPT, 核心采样, 贪心搜索

Abstract: The huge address space of IPv6 makes it difficult to achieve a global IPv6 address scan based on the existing network speed and hardware computing power.Fast IPv6 address scanning can be achieved by using address generation algorithms to predict the possible IPv6 addresses in the network and subsequently using the predicted addresses as the targets of scanning.This paper explores potential allocation patterns by analyzing IPv6 address structures and allocation methods,and proposes a deep learning-based algorithm 6LMNS to predict potentially active IPV6 addresses by combining existing traditional language models and target generation algorithms.6LMNS first constructs IPv6 address word vector spaces with certain semantic relationships through the address vector space mapping model Add2vec.Subsequently,the language training model GPT-IPv6 is constructed based on Transformers to estimate the probability distribution of IPv6 address word sequences.Finally,nucleus sampling is introduced instead of traditional greedy search decoding to complete the generation of active addresses.It is verified that the addresses generated by 6LMNS have better diversity as well as higher activity rate compared with other language models and target generation algorithms.

Key words: Deep Learning, Word2Vec, GPT, Nucleus sampling, Greedy search

中图分类号: 

  • TP393
[1]BOU-HARB E,DEBBABI M,ASSI C.Cyber Scanning:A Comprehensive Survey[J].IEEE Communications Surveys and Tutorials,2014,16(3):1496-1519.
[2]RYE E C,BEVERLY R.Discovering the IPv6 network peri-phery[C]//International Conference on Passive and Active Network Measurement.Cham,Springer,2020:3-18.
[3]BEVERLY R,DURAIRAJAN R,PLONKA D,et al.In the IP of the beholder,Strategies for active IPv6 topology discovery[C]//Proceedings of the Internet Measurement Conference 2018.2018:308-321.
[4]KÜHRER M,HUPPERICH T,ROSSOW C,et al.ExitfromHell?Reducing the Impact of Amplification DDoS Attacks[C]//Proceedings of the 23rd USENIX Security Symposium.USA,2014:111-125.
[5]BEVERLY R.Yarrp’ing the Internet: Randomized high-speed active topology discovery[C]//Proceedings of the 2016 Internet Measurement Conference.2016:413-420.
[6]PLONKA D,BERGER A.KIP,A measured approach to IPv6address anonymization[J].arXiv:1707.03900,2017.
[7]SARABI A,LIU M.Characterizing the internet host population using deep learning,A universal and lightweight numerical embedding[C]//Proceedings of the Internet Measurement Confe-rence 2018.2018:133-146.
[8]DURUMERIC Z,WUSTROW E,HALDERMAN J A.{ZMap},Fast Internet-wide Scanning and Its Security Applications[C]//22nd USENIX Security Symposium(USENIX Secu-rity 13).2013:605-620.
[9]GRAHAM R D.Masscan,Mass ip port scanner[EB/OL].https://github.com/robertdavidgraham/masscan.
[10]GASSER O,SCHEITLE Q,GEBHAR D S,et al.Scanning the IPv6 internet,towards a comprehensive hitlist[J].arXiv:1607.05179,2016.
[11]STROWES S D.Bootstrapping active IPv6 measurement withIPv4 and public DNS[J].arXiv:1710.08536,2017.
[12]FIEBIG T,BORGOLTE K,HAO S,et al.In rDNS we trust,revisiting a common data-source’s reliability[C]//International Conference on Passive and Active Network Measurement.Cham:Springer,2018:131-145.
[13]HOLTZMAN A,BUYS J,DU L,et al.The curious case of neural text degeneration[J].arXiv:1904.09751,2019.
[14]COULL S E,MONROSE F,BAILEY M.On Measuring theSimilarity of Network Hosts,Pitfalls,New Metrics,and Empirical Analyses[C]//NDSS.2011.
[15]RING M,DALLMANN A,LANDES D,et al.Ip2vec:Learning similarities between ip addresses[C]//2017 IEEE International Conference on Data Mining Workshops(ICDMW).IEEE,2017:657-666.
[16]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[17]PLONKA D,BERGER A.Temporal and spatial classification of active IPv6 addresses[C]//Proceedings of the 2015 Internet Measurement Conference.2015:509-522.
[18]ULLRICH J,KIESEBERG P,KROMBHOLZ K,et al.On re-connaissance with IPv6:a pattern-based scanning approach[C]//2015 10th International Conference on Availability,Reliability and Security.IEEE,2015:186-192.
[19]FOREMSKI P,PLONKA D,BERGER A.Entropy/ip:Uncovering structure in IPv6 addresses[C]//Proceedings of the 2016 Internet Measurement Conference.2016:167-181.
[20]ZUO Z,MA Y,ZHANG P,et al.Predictional algorithm of active IPv6 address prefix[J].Iournal on Communications,2018,39(S1):1-8.
[21]MURDOCK A,LI F,BRAMSEN P,et al.Target generationforinternet-wide IPv6 scanning[C]//Proceedings of the 2017 Internet Measurement Conference.2017:242-253.
[22]LIU Z,XIONG Y,LIU X,et al.6Tree:Efficient dynamic disco-very of active addresses in the IPv6 address space[J].Computer Networks,2019,155:31-46.
[23]SONG G,HE L,WANG Z,et al.Towards the construction of global IPv6 hitlist and efficient probing of IPv6 address space[C]//2020 IEEE/ACM 28th International Symposium on Qua-lity of Service(IWQoS).IEEE,2020:1-10.
[24]CUI T,XIONG G,GOU G,et al.6veclm:Language modeling in vector space for IPv6 target generation[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Springer,2020:192-207.
[25]CUI T,GOU G,XIONG G,et al.6GAN:IPv6 multi-pattern target generation via generative adversarial nets with reinforcement learning[C]//IEEE INFOCOM 2021-IEEE Conference on Computer Communications.IEEE,2021:1-10.
[26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,1706:03762.
[27]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[28]DAI Z,YANG Z,YANG Y,et al.Transformer-xl:Attentivelanguage models beyond a fixed-length context[J].arXiv:1901.02860,2019.
[29]DAUPHIN Y N,FAN A,AULI M,et al.Language modelingwith gated convolutional networks[C]//International Confe-rence on Machine Learning.PMLR,2017:933-941.
[30]ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent neural network regularization[J].arXiv:1409.2329,2014.
[31]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Impro-ving language understanding by generative pre-training[J/OL].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[32]FREITAG M,AL-ONAIZAN Y.Beam search strategies forneural machine translation[J].arXiv:1702.01806,2017.
[33]VIJAYAKUMAR A,COGSWELL M,SELVARAJU R,et al.Diverse beam search for improved description of complex scenes[C]//Proceedings of the AAAI Conferenceon Artificial Intelligence.2018.
[34]MEISTER C,VIEIRA T,COTTERELL R.If beam search is the answer,what was the question?[J].arXiv:2010.02650,2020.
[35]WELLECK S,KULIKOV I,ROLLER S,et al.Neural text ge-neration with unlikelihood training[J].arXiv:1908.04319,2019.
[36]SUNDERMEYER M,SCHLÜTER R,NEY H.LSTM neural networks for language modeling[C]//Thirteenth Annual Conference of the International SpeechCommunication Association.2012.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!