计算机科学 ›› 2022, Vol. 49 ›› Issue (2): 342-352.doi: 10.11896/jsjkx.201000155
黄鑫权, 刘爱军, 梁小虎, 王桁
HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng
摘要: 针对多跳空中传感器网络(Aerial Sensor Network,ASN)中的负载不均衡问题,提出了强化学习(Reinforcement Learning,RL)理论辅助的队列高效地理路由(Reinforcement-Learning Based Queue-Efficient Geographic Routing,RLQE-GR)协议。RLQE-GR协议首先将ASN路由问题抽象为强化学习(RL)任务,其中每个无人机抽象为一个RL状态,而数据包的每跳成功转发则抽象为一个RL动作。其次,RLQE-GR协议中引入了新的奖赏函数来评估每次动作,该奖赏函数的值不仅与无人机节点地理位置和每跳链路质量相关,而且与无人机节点的可用路由队列长度密切相关。然后,根据所设计的奖赏函数,RLQE-GR协议利用Q函数分布式地更新每个动作的长期累积奖赏值(Q值),并使得每个节点根据本地Q值的大小采用贪婪策略转发数据包。最后,为了使全网的Q值快速收敛且最小化收敛过程中造成的路由性能损失,RLQE-GR采用周期性信标机制对Q值进行迭代更新。当Q值收敛时,RLQE-GR协议能够实现可靠有效的多跳数据传输性能。与现有地理路由协议相比,所提协议在转发数据包的同时考虑了节点之间的相对距离、每跳链路质量和中间节点路由队列利用率。这使得RLQE-GR协议能够在保证路由跳数以及数据包重传次数的限制下,实现ASN的负载均衡。此外,利用强化学习理论,所提协议可以实现近乎最优的路由性能。
中图分类号:
| [1]ASADPOUR M,HUMMEL K A,GIUSTINIANO D,et al.Route or carry:Motion-driven packet forwarding in micro aerial vehicle networks[J].IEEE Transactions on Mobile Computing,2017,16(3):843-856. [2]ZHANG C.Progress in Time Synchronization Technology forWireless Sensor Networks[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2019,36(6):88-94. [3]YANG J,GU Y H,XU Q,et al.A Node Sleeping Routing Algorithm for Underwater Wireless Sensor Networks[ J].Journal of Chongqing University of Technology( Natural Science),2020,34(1):226-234. [4]ASADPOUR M,VAN DEN BERGH B,GIUSTINIANO D,et al.Micro aerial vehicle networks:An experimental analysis of challenges and opportunities[J].IEEE Communications Magazine,2014,52(7):141-149. [5]BIOMO J D M M,KUNZ T,ST-HILAIRE M.Routing in unmanned aerial ad hoc networks:A recovery strategy for greedy geographic forwarding failure[C]//2014 IEEE Wireless Communications and Networking Conference(WCNC).2014:2236-2241. [6]MOSTEFAOUI A,MELKEMI M,BOUKERCHE A.Localizedrouting approach to bypass holes in wireless sensor networks[J].IEEE Transactions on Computers,2013,63(12):3053-3065. [7]HUANG H,YIN H,MIN G,et al.Coordinate-assisted routingapproach to bypass routing holes in wireless sensor networks[J].IEEE Communications Magazine,2017,55(7):180-185. [8]HUANG H,YIN H,MIN G,et al.Energy-aware dual-path geo-graphic routing to bypass routing holes in wireless sensor networks[J].IEEE Transactions on Mobile Computing,2017,17(6):1339-1352. [9]XIA Y,QIN X,LIU B,et al.A greedy traffic light and queueaware routing protocol for urban VANETs[J].China Communications,2018,15(7):77-87. [10]WANG X,LIU X,WANG M,et al.Energy-Efficient SpatialQuery-Centric Geographic Routing Protocol in Wireless Sensor Networks[J].Sensors,2019,19(10):2363. [11]SINGH P,CHEN Y C.Energy efficient greedy forwarding based on residual energy for wireless sensor networks[C]//2018 27th Wireless and Optical Communication Conference(WOCC).2018:1-6. [12]JUNG W S,YIM J,KO Y B.QGeo:Q-learning-based geographic ad hoc routing protocol for unmanned robotic networks[J].IEEE Communications Letters,2017,21(10):2258-2261. [13]ZHANG K,ZHANG W,LI W,et al.Research of applicability for UAV Ad Hoc networks preactive routing protocols[J].Computer Engineering and Applications,2010,46(2):4-6,18. [14]TABBANA F.Performance Comparison and Analysis of Proactive,Reactive and Hybrid Routing Protocols for Wireless Sensor Networks[J].International Journal of Wireless & Mobile Networks,2020,12(4):20. [15]ANSHORI H A,ABDUROHMAN M.Comparison of Reactive Routing Protocol Dynamic Manet on Demand and Ad Hoc on Demand Distance Vector for Improving Vehicular Ad hoc Network Performance[J].Advanced Science Letters,2015,21(1):20-23. [16]JIANG J,HAN G.Routing protocols for unmanned aerial vehicles[J].IEEE Communications Magazine,2018,56(1):58-63. [17]LEMMON C,LUI S M,LEE I.Geographic Forwarding andRouting for Ad-hoc Wireless Network:A Survey[C]//Procee-dings of 2009 Fifth International Joint Conference on INC,IMS and IDC.Seoul,South Korea,2009:188-195. [18]AOUIZ A A,HACENE S B,LORENZ P.Channel BusynessBased Multipath Load Balancing Routing Protocol for Ad hoc Networks[J].IEEE Network,2019,33(5):118-125. [19]DSOUZA M B,MANJAIAH D H.Congestion Free And Bandwidth Aware Multipath Protocol for MANET[C]//2019 1st International Conference on Advances in Information Technology(ICAIT).2019:267-270. [20]POURBEMANY J,MIRJALILY G,ABOUEI J,et al.Load Ba-lanced Ad-Hoc On-Demand Routing Based on Weighted Mean Queue Length Metric[C]//Iranian Conference on Electrical Engineering(ICEE).2018:470-475. [21]SINGH G,SHARMA A K,BAWA O S,et al.Effective Congestion Control In MANET[C]//2020 International Conference on Intelligent Engineering and Management(ICIEM).2020:86-90. [22]REN W,BEARD R W,ATKINS E M.Information consensus in multivehicle cooperative control[J].IEEE Control Systems Magazine,2007,27(2):71-82. [23]SON J,CHOI S,CHA J.A brief survey of sensors for detect,sense,and avoid operations of small unmanned aerial vehicles[C]//Proceedings of 17th International Conference on Control,Automation and Systems.Jeju,South Korea,2017:279-282. [24]LUO W.An efficient sensor-mission assignment algorithm based on dynamic alliance and quantum genetic algorithm in wireless sensor networks[C]//Proceedings of 2010 International Confe-rence on Intelligent Computing and Integrated Systems.Guilin,China,2010:854-857. [25]TALGINI A,SHAKARAMI V,SHEIKHOLESLAM F,et al.Aerial node placement in wireless sensor networks using Fuzzy K-means clustering[C]//Proceedings of 8th International Conference on e-Commerce in Developing Countries:With Focus on e-Trust.Mashhad,Iran,2014:1-7. [26]OMAR H A,ZHUANG W,LI L.VeMAC:A TDMA-basedMAC protocol for reliable broadcast in VANETs[J].IEEE Transactions on Mobile Computing,2013,12(9):1724-1736. [27]SENGOKU M,TAMURA H,MASE K,et al.A routing pro-blem on ad-hoc networks and graph theory[C]//Proceedings of International Conference on Communication Technology Proceedings.Beijing,China,2000:1710-1713. [28]NOWE A,BRYS T.A Gentle Introduction to ReinforcementLearning[C]//Proceedings of International Conference on Sca-lable Uncertainty Management.Nice,France,2016:18-32. [29]KUIPER E,NADJM-TEHRANI S.Mobility models for UAVgroup reconnaissance applications[C]//2006 International Conference on Wireless and Mobile Communications(ICWMĆ06).2006:33. [30]WANG W,DONG C,WANG H,et al.Design and implementation of adaptive MAC framework for UAV ad hoc networks[C]//Proceedings of 12th International Conference on Mobile Ad-Hoc and Sensor Networks.Hefei,China,2016:195-201. | 
| [1] | 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148 | 
| [2] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 | 
| [3] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 | 
| [4] | 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174 | 
| [5] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 | 
| [6] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 | 
| [7] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 | 
| [8] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 | 
| [9] | 郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型 Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101 | 
| [10] | 范静宇, 刘全. 基于随机加权三重Q学习的异策略最大熵强化学习算法 Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning 计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081 | 
| [11] | 张佳能, 李辉, 吴昊霖, 王壮. 一种平衡探索和利用的优先经验回放方法 Exploration and Exploitation Balanced Experience Replay 计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084 | 
| [12] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 | 
| [13] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 | 
| [14] | 周琴, 罗飞, 丁炜超, 顾春华, 郑帅. 基于逐次超松弛技术的Double Speedy Q-Learning算法 Double Speedy Q-Learning Based on Successive Over Relaxation 计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173 | 
| [15] | 李素, 宋宝燕, 李冬, 王俊陆. 面向金融活动的复合区块链关联事件溯源方法 Composite Blockchain Associated Event Tracing Method for Financial Activities 计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068 | 
| 
 | ||