计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240900018-9.doi: 10.11896/jsjkx.240900018
吴宗明1, 曹继军2, 汤强1
WU Zongming1, CAO Jijun2, TANG Qiang1
摘要: 传统基于深度强化学习(DRL)的SDN流量工程模型路由行为往往不可预测,并且传统基于DRL的路由方案简单地将DRL算法应用于通信网络系统中是不可靠的。为此,提出了一种基于DRL的在线并行SDN路由优化算法,通过可靠地利用具有试错性质的DRL路由算法来提高网络性能。该算法在SDN框架中采用在线并行的路由决策和线下训练相结合的方法来解决SDN路由优化问题。该方法能缓解由于深度强化学习模型尚未收敛以及探索过程所带来的可靠性问题,一定程度上也能缓解深度强化学习智能路由模型不可解释性以及网络突发状况下路由行为不可靠性所带来的负面影响。通过在一个真实网络拓扑上进行大量实验来评估该在线并行SDN路由优化算法的性能。实验结果表明,所提出的在线并行SDN路由优化算法获得的网络性能优于传统的基于DRL的路由算法和OSPF算法。
中图分类号:
[1]KREUTZ D,RAMOS F M,VERISSIMO P E,et al.Software-defined networking:A comprehensive survey[J].Proceedings of the IEEE,2014,103(1):14-76. [2]LIU Y C,ZHANG J N.Service Function Chain EmbeddingMeets Machine Learning:Deep ReinforcementLearning Approach[J].IEEE Transactions on Networking and Service Management,2024,21(3):3465-3481. [3]WANG H N,LIU N,ZHANG Y Y,et al.Deep Reinforcement Learning:A Survey[J].Frontiers of Information Technology & Electronic Engineering,2020,21(12):1726-1744. [4]AMIN R,ROJAS E,AQDUS A,et al.A survey on MachineLearning Techniques for Routing Optimization in SDN[J].IEEE Access,2021:104582-104611. [5]LAROCHE R,TRICHELAIR P.Safe Policy Improvement with Baseline Bootstrapping[J].arXiv:1712.06924,2017 [6]YAO H P,MAI T L,XU X B,et al.NetworkAI:An Intelligent Network Architecture for Self-Learning Control Strategies in Software Defined Networks[J].IEEE Internet of Things Journal,2018,5(6):4319-4327. [7]JALIL S Q,REHMANI M H,CHALUP S.DQR:Deep Q-Routing in Software Defined Networks[C]//International Joint Conference on Neural Networks.Glasgow:IEEE,2020:1-8. [8]YU C,LAN J,GUO Z,et al.DROM:Optimizing the Routing in Software-Defined Networks With Deep Reinforcement Learning[J].IEEE Access,2018,6:64533-64539. [9]SUN P,HU Y,LAN J,et al.TIDE:Time-relevant deep rein-forcement learning for routing optimization[J].Future Generation Computer Systems,2019,99:401-409. [10]XU Z,TANG J,MENG J,et al.Experience-driven Networking:A deep reinforcement learning based approach[C]//IEEE INFOCOM 2018-IEEE Conference on Computer Communications.Honolulu:IEEE,2018:1871-1879. [11]CHEN Y R,REZAPOUR A,TZENG W G,et al.RL-Routing:An SDN Routing Algorithm Based on Deep Reinforcement Learning[J].IEEE Transactions on Network Science and Engineering,2020,7(4):3185-3199. [12]SUN P,GUO Z,LAN J,et al.ScaleDRL:A Scalable Deep Reinforcement Learning Approach for Traffic Engineering in SDN with Pinning Control[J].Computer Networks,2021,190:107891. [13]WANG X F,CHEN G.Pinning control of scale-free dynamical networks[J].Physica A:Statistical Mechanics and Its Applications,2002,310(43528):521-531. [14]SUN P H ,GUO Z H,LI J F,et al.Enabling Scalable Routing in Software-Defined Networks With Deep Reinforcement Learning on Critical Nodes[J].IEEE-ACM Transactions on Networking,2022,30(2):629-640. [15]ZHOU W,JIANG X,LUO Q S,et al.AQROM:A quality of service aware routing optimization mechanism based on asynchronous advantage actor-critic in software-defined networks[J].Digital Communications and Networks,2024,10(5):1405-1414. [16]HE Q,WANG Y,WANG X W,et al.Routing OptimizationWith Deep Reinforcement Learning in Knowledge Defined Networking[J].IEEE Transactions on Mobile Computing,2024,23(2):1444-1455. [17]SHEN R.Valiant Load-Balancing: Building Networks That Can Support All Traffic Matrices[M]//Algorithms for Next Generation Networks.Computer Communications and Networks.London:Springer,2010. [18]AlSHALABI L,SHAABAN Z.Normalization as a Preproces-sing Engine for Data Mining and the Approach of Preference Matrix[C]//2006 International Conference on Dependability of Computer Systems.Szklarska:IEEE,2006. [19]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[J].arXiv:1801.01290,2018. [20]WINSTEIN K,BALAKRISHNAN H.TCP ex machina:com-puter-generated congestion control[J].ACM SIGCOMM Computer Communication Review,2013,43(4):123-134. [21]VARGA A,HORNIG R.An overview of the OMNeT++ simu-lation environment[C]//Proceedings of the 1st international conference on Simulation tools and techniques for communications.Brussels:ISCT,2008:1-10. [22]KNIGHT S,NGUYEN H X,FALKNER N,et al.The Internet topology zoo[J].IEEE Journal on Selected Areas in Communications,2011,29(9):1765-1775. [23]ROUGHAN M.Simplifying the synthesis of internet traffic matrices[J].Computer Communication Review,2005,35(5):93-96. [24]TUNE P,ROUGHAN M.Patiotemporal Traffic Matrix Synthesis[J].ACM SIGCOMM Computer Communication Review,2015,45(5):579-592. [25]KHAN A A,ZAFRULLAH M,HUSSAIN M,et al.Perform-ance analysis of OSPF and hybrid networks[C]//International Symposium on Wireless Systems & Networks.Lahorel.IEEE,2017:1-4. [26]AlMASAN P,SUÁREZVARELA J,RUSSEK K,et al.Deep reinforcement learning meets graph neural networks:Exploring a routing optimization use case[J].Computer Communications,2022,196:184-194. |
|