计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 271-277.doi: 10.11896/jsjkx.201000078

• 计算机网络 • 上一篇    下一篇

基于深度强化学习的无人机辅助弹性视频多播机制

成昭炜1,2, 沈航1,2, 汪悦1, 王敏1, 白光伟1   

  1. 1 南京工业大学计算机科学与技术学院 南京211816
    2 南京大学计算机软件新技术国家重点实验室 南京210093
  • 收稿日期:2020-10-14 修回日期:2021-03-15 出版日期:2021-09-15 发布日期:2021-09-10
  • 通讯作者: 沈航(hshen@njtech.edu.cn)
  • 作者简介:18052559504@163.com
  • 基金资助:
    国家自然科学基金项目(61502230);江苏省自然科学基金项目(BK20201357);江苏省“六大人才高峰”高层次人才资助项目(RJFW-020);计算机软件新技术国家重点实验室资助项目(KFKT2017B21);江苏省研究生科研与实践创新计划项目(KYCX20_1079,SJCX20_0351);国家教育部2019年第二批产学合作协同育人项目(201902182003)

Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast

CHENG Zhao-wei1,2, SHEN Hang1,2, WANG Yue1, WANG Min1, BAI Guang-wei1   

  1. 1 College of Computer Science and Technology,Nanjing Tech University,Nanjing 211816,China
    2 State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China
  • Received:2020-10-14 Revised:2021-03-15 Online:2021-09-15 Published:2021-09-10
  • About author:CHENG Zhao-wei,born in 1995,postgraduate.His main research interests include space-air-ground integrated networks and so on.
    SHEN Hang,born in 1984,Ph.D,asso-ciate professor.His main research in-terests include network slicing and space-air-ground integrated networks.
  • Supported by:
    National Natural Science Foundation of China (61502230),Natural Science Foundation of Jiangsu Province(BK20201357),Six Talent Peaks Project in Jiangsu Province (RJFW-020),State Key Laboratory Program for Novel Software Technology(KFKT2017B21),Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX20_1079,SJCX20_0351) and University-Industry Collaborative Education Program of the Ministry of Education(201902182003)

摘要: 文中提出了一个异构网络下无人机基站辅助的弹性视频多播机制。结合SVC编码,将无人机动态部署和资源分配问题联合考虑,目的是最大化用户整体的视频质量。考虑到宏基站覆盖范围内用户的移动会使网络拓扑结构发生改变,传统的启发式算法难以应对用户移动的复杂性。对此,采用基于深度强化学习的DDPG算法训练神经网络来决策无人机的最佳部署位置和带宽资源分配比重。在模型收敛后,学习代理可以在较短的时间内找到最优的无人机部署和带宽分配策略。仿真结果表明,所提方案达到了预期目标并且优于现有的基于Q-learning的方案。

关键词: 多播, 可伸缩视频编码, 深度强化学习, 无人机, 移动互联网

Abstract: In this paper,a flexible video multicast mechanism assisted by the UAV base station is proposed.In combination with SVC encoding,the dynamic deployment and resource allocation of UAV are considered jointly in order to maximize the overall number of enhancement layers received by users.The traditional heuristic algorithm is difficult to deal with the complexity of user movement,considering that the user movement within the range of macro station will change the network topology.To this end,the DDPG algorithm based on deep reinforcement learning is used to train the neural network to decide the optimal location and bandwidth allocation proportion of UAV.After the model converges,the learning agent can find the optimal UAV deployment and bandwidth allocation strategy in a short time.The simulation results show that the proposed scheme achieves the expected goal and is superior to the existing scheme based on Q-learning.

Key words: Deep reinforcement learning, Mobile Internet, Multicast, Scalable video coding(SVC), Unmanned aerial vehicles

中图分类号: 

  • TP393
[1]ARANITI G,CONDOLUCI M,SCOPELLITI P,et al.Multicasting over emerging 5G networks:Challenges and perspectives[J].IEEE Network,2017,31(2):80-89.
[2]AGIWAL M,ROY A,SAXENA N.Next generation 5G wireless networks:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2016,18(3):1617-1655.
[3]GHOSH A,MANGALVEDHE N,RATASUK R,et al.Heterogeneous cellular networks:From theory to practice[J].IEEE Communications Magazine,2012,50(6):54-64.
[4]BOR-YALINIZ R I,EL-KEYI A,YANIKOMEROGLU H.Effi-cient 3-D placement of an aerial base station in next generation cellular networks[C]//2016 IEEE International Conference on Communications (ICC).IEEE,2016:1-5.
[5]GUO W,DEVINE C,WANG S.Performance analysis of micro unmanned airborne communication relays for cellular networks[C]//2014 9th International Symposium on Communication Systems,Networks & Digital Sign (CSNDSP).IEEE,2014:658-663.
[6]MOZAFFARI M,SAAD W,BENNIS M,et al.Drone small cells in the clouds:Design,deployment and performance analysis[C]//2015 IEEE Global Communications Conference (GLOBECOM).IEEE,2015:1-6.
[7]BOR-YALINIZ I,YANIKOMEROGLU H.The new frontier in RAN heterogeneity:Multi-tier drone-cells[J].IEEE Communications Magazine,2016,54(11):48-55.
[8]DERUYCK M,WYCKMANS J,MARTENS L,et al.Emergency ad-hoc networks by using drone mounted base stations for a disaster scenario[C]//2016 IEEE 12th International Conference on Wireless and Mobile Computing,Networking and Communications (WiMob).IEEE,2016:1-7.
[9]KALANTARI E,BOR-YALINIZ I,YONGACOGLU A,et al.User association and bandwidth allocation for terrestrial and aerial base stations with backhaul considerations[C]//2017 IEEE 28th Annual International Symposium on Personal,Indoor,and Mobile Radio Communications (PIMRC).IEEE,2017:1-6.
[10]PENG H,SHEN X.Multi-agent reinforcement learning based resource management in MEC- and UAV-assisted vehicular networks[C]//IEEE Journal on Selected Areas in Communications.2021:131-141.
[11]WU H,LYU F,ZHOU C,et al.Optimal UAV caching and tra-jectory in aerial-assisted vehicular networks:A learning-based approach[C]// IEEE Journal on Selected Areas in Communications.2020:2783-2797.
[12]CHENG N,LYU F,QUAN W,et al.Space/aerial-assisted computing offloading for IoT applications:A learning-based approach[J].IEEE Journal on Selected Areas in Communications,2019,37(5):1117-1129.
[13]ZHOU C,WU W,HE H,et al.Delay-aware iot task scheduling in space-air-ground integrated network[C]// IEEE GLOBECOM.2019:1-6.
[14]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015.
[15]StackExange.Implementing Ornstein-Uhlenbeck in Matlab[OL].(2017-09-22) [2020-05-20].https://math.stackexchange.com/questions/1287634/implementing-ornstein-uhlenbeck-in-matlab.
[16]ROTA BULÒ S,PORZI L,KONTSCHIEDER P.In-place activated batchnorm for memory-optimized training of dnns[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5639-5647.
[17]GLOROT X,BORDES A,BENGIO Y.Deep sparse rectifierneural networks[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence Andstatistics.2011:315-323.
[18]BA J L,KIROS J R,HINTON G E.Layer normalization[J].arXiv:1607.06450,2016.
[19]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.2016:1928-1937.
[20]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 刘鑫, 王珺, 宋巧凤, 刘家豪.
一种基于AAE的协同多播主动缓存方案
Collaborative Multicast Proactive Caching Scheme Based on AAE
计算机科学, 2022, 49(9): 260-267. https://doi.org/10.11896/jsjkx.210800019
[3] 蹇奇芮, 陈泽茂, 武晓康.
面向无人机通信的认证和密钥协商协议
Authentication and Key Agreement Protocol for UAV Communication
计算机科学, 2022, 49(8): 306-313. https://doi.org/10.11896/jsjkx.220200098
[4] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[5] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[6] 刘漳辉, 郑鸿强, 张建山, 陈哲毅.
多无人机使能移动边缘计算系统中的计算卸载与部署优化
Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems
计算机科学, 2022, 49(6A): 619-627. https://doi.org/10.11896/jsjkx.210600165
[7] 陈博琛, 唐文兵, 黄鸿云, 丁佐华.
基于改进人工势场的未知障碍物无人机编队避障
Pop-up Obstacles Avoidance for UAV Formation Based on Improved Artificial Potential Field
计算机科学, 2022, 49(6A): 686-693. https://doi.org/10.11896/jsjkx.210500194
[8] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[9] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[10] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[11] 史殿习, 刘聪, 佘馥江, 张拥军.
GPS拒止环境下基于定位置信度的多无人机协同定位方法
Cooperation Localization Method Based on Location Confidence of Multi-UAV in GPS-deniedEnvironment
计算机科学, 2022, 49(4): 302-311. https://doi.org/10.11896/jsjkx.210200106
[12] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[13] 赵耿, 宋鑫宇, 马英杰.
混沌子载波调制的无人机安全数据链路
Secure Data Link of Unmanned Aerial Vehicle Based on Chaotic Sub-carrier Modulation
计算机科学, 2022, 49(3): 322-328. https://doi.org/10.11896/jsjkx.210200022
[14] 代珊珊, 刘全.
基于动作约束深度强化学习的安全自动驾驶方法
Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method
计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[15] 徐浩, 刘岳镭.
基于深度学习的无人机声音识别算法
UAV Sound Recognition Algorithm Based on Deep Learning
计算机科学, 2021, 48(7): 225-232. https://doi.org/10.11896/jsjkx.200500091
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!