计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 415-423.doi: 10.11896/jsjkx.250900139
张灿1, 栗维勋2, 汪明3, 詹雄3, 颉子光4, 韩东岐1, 王之梁1, 杨家海1
ZHANG Can1, LI Weixun2, WANG Ming3, ZHAN Xiong3, XIE Ziguang4, HAN Dongqi1, WANG Zhiliang1, YANG Jiahai1
摘要: 恶意流量识别是网络安全防护中的关键任务,训练数据的质量直接决定识别模型的准确性。然而,受隐私保护、标注成本和类别不均衡等因素限制,真实数据获取十分困难。为解决上述挑战,提出了一种基于预训练-微调模型的细粒度网络流量生成方法。该方法首先设计了一种保留协议结构信息的静态分词方案,将原始流量转换为协议语义保持的可供自回归模型学习的序列表示。在此基础上,构建了预训练-微调的两阶段生成框架:先以大规模良性流量学习通用协议与时序模式,继而在标注的恶意流量上进行任务定向微调,生成具备明确攻击语义的高保真样本。为了验证流量生成方法的效果,设计了多个维度的实验评估,结果证明,所提方法在协议合规性(领域专家知识检查通过率高达99.95%)、分布相似性(生成/真实分布间推土机距离仅为0.005 9)及生成多样性(真实邻域覆盖度超过50%)均优于主流基准模型;在使用生成流量训练的恶意流量识别任务中,相较于基准方法,所提方法唯一实现了多种分类器的检测效果提升。此外,设计了恶意功能验证实验,在两种攻击场景下验证了所提方法生成流量的攻击效果。实验结果表明,所提方法能够生成语法合规、统计相似且语义功能正确的细粒度恶意流量,为解决网络安全领域流量数据稀缺问题提供了有效的技术途径。
中图分类号:
| [1]FU C,LI Q,SHEN M,et al.Realtime robust malicious traffic detection via frequency domain analysis[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2021:3431-3446. [2]HAN D,WANG Z,CHEN W,et al.Anomaly Detection in the Open World:Normality Shift Detection,Explanation,and Adaptation[C]//30th Annual Network and Distributed System Security Symposium(NDSS).2023:1-18. [3]LIAN X,CAO C,LIU Y,et al.Facing Anomalies Head-On:Network Traffic Anomaly Detection via Uncertainty-Inspired Inter-Sample Differences[C]//Proceedings of the ACM on Web Conference 2025.New York:ACM,2025:3908-3917. [4]ZHAO Z,LI Z,SONG Z,et al.Trident:A universal framework for fine-grained and class-incremental unknown traffic detection[C]//Proceedings of the ACM Web Conference 2024.New York:ACM,2024:1608-1619. [5]ZHOU G,GUO X,LIU Z,et al.TrafficFormer:An EfficientPre-trained Model for Traffic Data[C]//2025 IEEE Symposium on Security and Privacy(SP).San Francisco:IEEE Computer Society,2024:102-118. [6]ADELEKE O A,BASTIN N,GURKAN D.Network trafficgeneration:A survey and methodology[J].ACM Computing Surveys,2022,2:1-23. [7]DU Z,QIAN Y,LIU X,et al.GLM:General Language Model Pretraining with Autoregressive Blank Infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.ACL,2022:320-335. [8]DONG Y,DING J,JIANG X,et al.Codescore:Evaluating code generation by learning code execution[J].ACM Transactions on Software Engineering and Methodology,2025,3:1-22. [9]LI J,LI G,LI Y,et al.Structured chain-of-thought promptingfor code generation[J].ACM Transactions on Software Engineering and Methodology,2025,2:1-23. [10]HENDERSON T R,LACAGE M,RILEY G F,et al.Networksimulations with the ns-3 simulator[J].SIGCOMMDemonstration,2008,4:527-527. [11]BÜHLER T,SCHMID R,LUTZ S,et al.Generating representative,live network traffic out of millions of code repositories[C]//Proceedings of the 21st ACM Workshop on Hot Topics in Networks.New York:ACM,2022:1-7. [12]ROLLAND C,RIDOUX J,BAYNAT B.LiTGen,a lightweight traffic generator:application to P2P and mail wireless traffic[C]//Passive and Active Network Measurement:8th Internatinoal Conference.Berlin:Springer,2007:52-62. [13]Naval Research Laboratory.Multi-Generator(MGEN)[EB/OL].(2021-08-25)[2025-09-19].https://www.nrl.navy.mil/itd/ncs/products/mgen. [14]CHU A,JIANG X,LIU S,et al.Feasibility of state space models for network traffic generation[C]//Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing.New York:ACM,2024:9-17. [15]RING M,SCHLÖR D,LANDES D,et al.Flow-based network traffic generation using generative adversarial networks[J].Computers & Security,2019,82:156-172. [16]LIN Z,JAIN A,WANG C,et al.Using gans for sharing networked time series data:Challenges,initial promise,and open questions[C]//Proceedings of the ACM Internet Measurement Conference.New York:ACM,2020:464-483. [17]YIN Y,LIN Z,JIN M,et al.Practical gan-based synthetic ipheader trace generation using netshare[C]//Proceedings of the ACM SIGCOMM 2022 Conference.New York:ACM,2022:458-472. [18]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances inNeural Information Processing Systems,2017,30:6000-6010. [19]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9. [20]MENG X,LIN C,WANG Y,et al.Netgpt:Generative pre-trained transformer for network traffic[J].arXiv:2304.09513,2023. [21]WANG Q,QIAN C,LI X,et al.Lens:A foundation model for network traffic in cybersecurity[J].arXiv:2402.03646,2024. [22]HO J,JAIN A,ABBEEL P.Denoising diffusion probabilisticmodels[J].Advances inNeural Information Processing Systems,2020,33:6840-6851. [23]SIVAROOPAN N,BANDARA D,MADARASINGHA C,et al.Netdiffus:Network traffic generation by diffusion models through time-series imaging[J].Computer Networks,2024,251:1-13. [24]ZHANG S,LI T,JIN D,et al.NetDiff:A service-guided hierarchical diffusion model for network flow trace generation[C]//Proceedings of the ACM on Networking.2024:1-21. [25]JIANG X,LIU S,GEMBER-JACOBSON A,et al.Netdiffusion:Network data augmentation through protocol-constrained traffic generation[J].Proceedings of the ACM on Measurement and Analysis of Computing Systems,2024,8(1):1-32. [26]GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].Advances in Neural Information Processing Systems,2014,27:2672-2680. [27]NETOEC P,DADKHAH S,FERREIRA R,et al.CICIoT2023:A real-time dataset and benchmark for large-scale attacks in IoT environment[J].Sensors,2023,23(13):5941-5967. |
| [1] | 王中原, 王宝山, 王拥军, 袁天浩. 生成式人工智能在视频处理领域的应用综述 Review of Applications of Artificial Intelligence Generated Content in Video Processing 计算机科学, 2025, 52(11A): 241200164-10. https://doi.org/10.11896/jsjkx.241200164 |
| [2] | 袁天浩, 王拥军, 王宝山, 王中原. 生成式人工智能在自然语言处理中的应用综述 Review of Artificial Intelligence Generated Content Applications in Natural Language Processing 计算机科学, 2025, 52(11A): 241200156-12. https://doi.org/10.11896/jsjkx.241200156 |
| [3] | 陈康, 林建涵, 刘元杰. 图像去模糊算法研究综述 Survey on Image Deblurring Algorithms 计算机科学, 2025, 52(11): 98-112. https://doi.org/10.11896/jsjkx.241200045 |
| [4] | 李嘉晖, 张萌萌, 陈洪辉. 大模型驱动多智能体的军事需求生成框架 Large Language Models Driven Framework for Multi-agent Military Requirement Generation 计算机科学, 2025, 52(1): 65-71. https://doi.org/10.11896/jsjkx.240800022 |
| [5] | 颜玉松, 周圆, 王琮, 孔圣麒, 王权, 黎敏讷, 王之元. 基于预训练大模型的行动方案生成方法 COA Generation Based on Pre-trained Large Language Models 计算机科学, 2025, 52(1): 80-86. https://doi.org/10.11896/jsjkx.240900075 |
| [6] | 董红斌, 韩爽, 付强. 基于AR与DNN联合模型的地理传感器时间序列预测 Geo-sensory Time Series Prediction Based on Joint Model of Auto Regression and Deep NeuralNetwork 计算机科学, 2023, 50(11): 41-48. https://doi.org/10.11896/jsjkx.230500231 |
| [7] | 廖仁健, 周丽华, 肖清, 杜国王. 基于knnVAR模型的地理传感数据预测 Prediction of Geosensor Data Based on knnVAR Model 计算机科学, 2018, 45(11A): 431-435. |
| [8] | . 基于小波变换与自回归模型的网络流量预测 计算机科学, 2007, 34(7): 47-49. |
| [9] | . 一种噪声和畸变混沌信号的滤波策略--Ⅰ:盲信道均衡 计算机科学, 2006, 33(9): 61-65. |
| [10] | . 一种噪声和畸变混沌信号的滤波策略-Ⅱ:自适应解调 计算机科学, 2006, 33(10): 71-73. |
|
||