计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240900018-9.doi: 10.11896/jsjkx.240900018

• 网络&通信 • 上一篇    下一篇

基于深度强化学习的在线并行SDN路由优化算法研究

吴宗明1, 曹继军2, 汤强1   

  1. 1 长沙理工大学计算机与通信学院 长沙 410114
    2 国防科技大学计算机学院 长沙 410073
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 曹继军(caojijun@nudt.edu.cn)
  • 作者简介:(22408050278@stu.csust.edu.cn)
  • 基金资助:
    湖南省教育厅科研基金(23A0258);湖南省自然科学基金(2021JJ30736,2023JJ50331);长沙市自然科学基金(kq2014112);国家自然科学基金(62272063)

Online Parallel SDN Routing Optimization Algorithm Based on Deep Reinforcement Learning

WU Zongming1, CAO Jijun2, TANG Qiang1   

  1. 1 School of Computer Science and Communication Engineering,Changsha University of Science and Technology,Changsha 410114,China
    2 School of Computer Science,National University of Defense Technology,Changsha 410073,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:WU Zongming,born in 1995,postgraduate.His main research interests include new generation information communication network and software-defined networking.
    CAO Jijun,born in 1979.His main research interests include high-perfor-mance interconnect network,intelligent network management and enhanced software definition network for HPC.
  • Supported by:
    Scientific Research Fundation of the Education Department of Hunan Provincial(23A0258),Natural Science Foundation of Hunan Province(2021JJ30736,2023JJ50331),Natural Science Foundation of Changsha(kq2014112) and National Natural Science Foundation of China(62272063).

摘要: 传统基于深度强化学习(DRL)的SDN流量工程模型路由行为往往不可预测,并且传统基于DRL的路由方案简单地将DRL算法应用于通信网络系统中是不可靠的。为此,提出了一种基于DRL的在线并行SDN路由优化算法,通过可靠地利用具有试错性质的DRL路由算法来提高网络性能。该算法在SDN框架中采用在线并行的路由决策和线下训练相结合的方法来解决SDN路由优化问题。该方法能缓解由于深度强化学习模型尚未收敛以及探索过程所带来的可靠性问题,一定程度上也能缓解深度强化学习智能路由模型不可解释性以及网络突发状况下路由行为不可靠性所带来的负面影响。通过在一个真实网络拓扑上进行大量实验来评估该在线并行SDN路由优化算法的性能。实验结果表明,所提出的在线并行SDN路由优化算法获得的网络性能优于传统的基于DRL的路由算法和OSPF算法。

关键词: 软件定义网络, 深度强化学习, 路由优化

Abstract: The routing behavior of traditional SDN traffic engineering models based on deep reinforcement learning(DRL) is often unpredictable,and the traditional DRL-based routing scheme is unreliable if it simply applies the DRL algorithm to the communication network system.This paper proposes an online parallel SDN routing optimization algorithm based on DRL,so as to reliably utilize the trial-and-error DRL routing algorithm to improve network performance.The algorithm uses a combination of online parallel routing decision-making and offline training in the SDN framework to solve the SDN routing optimization problem.This method can alleviate the reliability issues arising from the deep reinforcement learning model’s lack of convergence and the exploration process.To a certain extent,it can also alleviate the negative impact of the unexplainability of the deep reinforcement lear-ning intelligent routing model and the unreliability of routing behavior under network emergencies.This paper evaluates the performance of the online parallel SDN routing optimization algorithm by extensive experiments on a real network topology.The experimental results show that the network performance of the proposed algorithm is better than the traditional DRL-based routing algorithm and OSPF algorithm.

Key words: Software-defined network, Deep reinforcement learning, Routing optimization

中图分类号: 

  • TP393.0
[1]KREUTZ D,RAMOS F M,VERISSIMO P E,et al.Software-defined networking:A comprehensive survey[J].Proceedings of the IEEE,2014,103(1):14-76.
[2]LIU Y C,ZHANG J N.Service Function Chain EmbeddingMeets Machine Learning:Deep ReinforcementLearning Approach[J].IEEE Transactions on Networking and Service Management,2024,21(3):3465-3481.
[3]WANG H N,LIU N,ZHANG Y Y,et al.Deep Reinforcement Learning:A Survey[J].Frontiers of Information Technology & Electronic Engineering,2020,21(12):1726-1744.
[4]AMIN R,ROJAS E,AQDUS A,et al.A survey on MachineLearning Techniques for Routing Optimization in SDN[J].IEEE Access,2021:104582-104611.
[5]LAROCHE R,TRICHELAIR P.Safe Policy Improvement with Baseline Bootstrapping[J].arXiv:1712.06924,2017
[6]YAO H P,MAI T L,XU X B,et al.NetworkAI:An Intelligent Network Architecture for Self-Learning Control Strategies in Software Defined Networks[J].IEEE Internet of Things Journal,2018,5(6):4319-4327.
[7]JALIL S Q,REHMANI M H,CHALUP S.DQR:Deep Q-Routing in Software Defined Networks[C]//International Joint Conference on Neural Networks.Glasgow:IEEE,2020:1-8.
[8]YU C,LAN J,GUO Z,et al.DROM:Optimizing the Routing in Software-Defined Networks With Deep Reinforcement Learning[J].IEEE Access,2018,6:64533-64539.
[9]SUN P,HU Y,LAN J,et al.TIDE:Time-relevant deep rein-forcement learning for routing optimization[J].Future Generation Computer Systems,2019,99:401-409.
[10]XU Z,TANG J,MENG J,et al.Experience-driven Networking:A deep reinforcement learning based approach[C]//IEEE INFOCOM 2018-IEEE Conference on Computer Communications.Honolulu:IEEE,2018:1871-1879.
[11]CHEN Y R,REZAPOUR A,TZENG W G,et al.RL-Routing:An SDN Routing Algorithm Based on Deep Reinforcement Learning[J].IEEE Transactions on Network Science and Engineering,2020,7(4):3185-3199.
[12]SUN P,GUO Z,LAN J,et al.ScaleDRL:A Scalable Deep Reinforcement Learning Approach for Traffic Engineering in SDN with Pinning Control[J].Computer Networks,2021,190:107891.
[13]WANG X F,CHEN G.Pinning control of scale-free dynamical networks[J].Physica A:Statistical Mechanics and Its Applications,2002,310(43528):521-531.
[14]SUN P H ,GUO Z H,LI J F,et al.Enabling Scalable Routing in Software-Defined Networks With Deep Reinforcement Learning on Critical Nodes[J].IEEE-ACM Transactions on Networking,2022,30(2):629-640.
[15]ZHOU W,JIANG X,LUO Q S,et al.AQROM:A quality of service aware routing optimization mechanism based on asynchronous advantage actor-critic in software-defined networks[J].Digital Communications and Networks,2024,10(5):1405-1414.
[16]HE Q,WANG Y,WANG X W,et al.Routing OptimizationWith Deep Reinforcement Learning in Knowledge Defined Networking[J].IEEE Transactions on Mobile Computing,2024,23(2):1444-1455.
[17]SHEN R.Valiant Load-Balancing: Building Networks That Can Support All Traffic Matrices[M]//Algorithms for Next Generation Networks.Computer Communications and Networks.London:Springer,2010.
[18]AlSHALABI L,SHAABAN Z.Normalization as a Preproces-sing Engine for Data Mining and the Approach of Preference Matrix[C]//2006 International Conference on Dependability of Computer Systems.Szklarska:IEEE,2006.
[19]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[J].arXiv:1801.01290,2018.
[20]WINSTEIN K,BALAKRISHNAN H.TCP ex machina:com-puter-generated congestion control[J].ACM SIGCOMM Computer Communication Review,2013,43(4):123-134.
[21]VARGA A,HORNIG R.An overview of the OMNeT++ simu-lation environment[C]//Proceedings of the 1st international conference on Simulation tools and techniques for communications.Brussels:ISCT,2008:1-10.
[22]KNIGHT S,NGUYEN H X,FALKNER N,et al.The Internet topology zoo[J].IEEE Journal on Selected Areas in Communications,2011,29(9):1765-1775.
[23]ROUGHAN M.Simplifying the synthesis of internet traffic matrices[J].Computer Communication Review,2005,35(5):93-96.
[24]TUNE P,ROUGHAN M.Patiotemporal Traffic Matrix Synthesis[J].ACM SIGCOMM Computer Communication Review,2015,45(5):579-592.
[25]KHAN A A,ZAFRULLAH M,HUSSAIN M,et al.Perform-ance analysis of OSPF and hybrid networks[C]//International Symposium on Wireless Systems & Networks.Lahorel.IEEE,2017:1-4.
[26]AlMASAN P,SUÁREZVARELA J,RUSSEK K,et al.Deep reinforcement learning meets graph neural networks:Exploring a routing optimization use case[J].Computer Communications,2022,196:184-194.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!