计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 363-371.doi: 10.11896/jsjkx.201000008

• 计算机网络 • 上一篇    下一篇

基于强化学习的高能效基站动态调度方法

曾德泽1, 李跃鹏1, 赵宇阳1, 顾琳2   

  1. 1 中国地质大学(武汉)计算机学院 武汉430074
    2 华中科技大学计算机科学与技术学院 武汉430074
  • 收稿日期:2020-10-03 修回日期:2021-04-10 出版日期:2021-11-15 发布日期:2021-11-10
  • 通讯作者: 顾琳(anheeno@gmail.com)
  • 作者简介:deze@cug.edu.cn
  • 基金资助:
    国家自然基金(61772480,61972171,62073300);之江实验室开放课题项目基金(2021KE0AB02)

Reinforcement Learning Based Dynamic Basestation Orchestration for High Energy Efficiency

ZENG De-ze1, LI Yue-peng1, ZHAO Yu-yang1, GU Lin2   

  1. 1 School of Computer Science,China University of Geosciences,Wuhan 430074,China
    2 School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China
  • Received:2020-10-03 Revised:2021-04-10 Online:2021-11-15 Published:2021-11-10
  • About author:ZENG De-ze,born in 1984,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include edge computing and artificial intelligence.
    GU Lin,born in 1985,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include edge computing and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61772480,61972171,62073300) and Open Research Projects of Zhijiang Lab(2021KE0AB02).

摘要: 随着移动通信技术的升级与移动通信产业的兴起,移动互联网正蓬勃发展。然而,由于移动设备爆发式增长,网络规模不断扩大和用户对服务质量的要求的不断提高,移动互联网络正面临着下一场技术革命。虽然5G技术可以通过密集的网络部署来实现千百倍的网络性能提升,但同信道干扰和高突发性的用户请求等问题使得该方案下需要消耗巨大的能量。为了在 5G 网络中提供高性能服务,升级改进现有网络管理方案势在必行。针对这些问题,使用带缓存队列的短周期管理框架实现对请求突发场景的敏捷平滑管理,避免由突发性请求导致的服务质量剧烈波动。此外,采用深度强化学习方法对用户分布、通信需求等进行自我学习,从而推测出基站的负载变化规律,进而实现对能量的预调度和预分配,在保证服务质量的同时提高能量的利用率。文中提出的双缓冲 DQN 算法在收敛速度上比传统 DQN 算法提高了近20%,且与当前广泛使用的基站常开策略相比,该算法能够节约4.8%的能量消耗。

关键词: 移动网络管理, 基站休眠, 异构网络, 双缓冲区, 请求突发, 深度强化学习

Abstract: The mutual promotion of mobile communication technology and mobile communication industry has achieved unprecedented prosperity in the mobile Internet era.The explosion of mobile devices,expansion of the network scale,improvement of service requirements are driving the next technological revolution in wireless networks.5G meets the requirements for the thousand-fold improvement of service performance through intensive network deployment,but co-channel interference and bursty request problems make the energy consumption of this solution very huge.In order to support 5G network to provide energy-efficient and high-performance services,it is imperative to upgrade and improve the management scheme of mobile networks.In this article,we use a short-cycle management framework with cache queues to achieve agile and smooth management of request burst scenarios to avoid dramatic fluctuations in service quality due to request bursts.We use deep reinforcement learning to learn the user distribution and communication needs,and infer the load change rules of the base station,and then realize the pre-scheduling and pre-allocation of energy,while ensuring the quality of service and improving the energy efficiency.Compared with the classic DQN algorithm,the two-buffer DQN algorithm proposed in this paper can provide nearly 20% acceleration in convergence.In terms of decision performance,it can save 4.8% energy consumption compared to the currently widely used keep on strategy.

Key words: Mobile network management, Base station sleep, Heterogeneous network, Double buffer, Request burst, Deep reinforcement learning

中图分类号: 

  • TP391
[1]OH E,KRISHNAMACHARI B,LIU X,et al.Toward dynamic energy efficient operation of cellular network infrastructure[J].IEEE Communications Magazine,2011,49(6):56-61.
[2]LÄHDEKORPI P,HRONEC M,JOLMA P,et al.Energy efficiency of 5G mobile networks with base station sleep modes[C]//2017 IEEE Conference on Standards for Communications and Networking (CSCN).IEEE,2017:163-168.
[3]ONIRETI O,MOHAMED A,PERVAIZ H,et al.Analytical approach to base station sleep mode power consumption and sleep depth[C]//2017 IEEE 28th Annual International Symposium on Personal,Indoor,and Mobile Radio Communications (PIMRC).IEEE,2017:1-7.
[4]ONIRETI O,MOHAMED A,PERVAIZ H,et al.A tractable ap-proach to base station sleep mode power consumption and deactivation latency[C]//2018 IEEE 29th Annual International Symposium on Personal,Indoor and Mobile Radio Communications (PIMRC).IEEE,2018:123-128.
[5]PERVAIZ H,ONIRETI O,MOHAMED A,et al.Energy-efficient and load-proportional eNodeB for 5G user-centric networks:a multilevel sleep strategy mechanism[J].IEEE Vehicular Technology Magazine,2018,13(4):51-59.
[6]LI J,WANG H,WANG X,et al.Optimized sleep strategy based on clustering in dense heterogeneous networks[J].EURASIP Journal on Wireless Communications and Networking,2018,2018(1):1-10.
[7]RATHEESH R,VETRIVELAN P.Energy efficiency based onrelay station deployment and sleep mode activation of eNBs for 4G LTE-A network[J].Automatika,2019,60(3):322-331.
[8]KLAPEZ M,GRAZIA C A,CASONI M.Energy Savings ofSleep Modes Enabled by 5G Software-Defined Heterogeneous Networks[C]//2018 IEEE 4th International Forum on Research and Technology for Society and Industry (RTSI).IEEE,2018:1-6.
[9]JAWAD A M,JAWAD H M,NORDIN R,et al.Wireless power transfer with magnetic resonator coupling and sleep/active strategy for a drone charging station in smart agriculture[J].IEEE Access,2019,7:139839-139851.
[10]WU J,BAO Y,MIAO G,et al.Base station sleeping and power control for bursty traffic in cellular networks[C]//2014 IEEE International Conference on Communications Workshops (ICC).IEEE,2014:837-841.
[11]WU J,BAO Y,MIAO G,et al.Base-station sleeping control and power matching for energy-delay tradeoffs with bursty traffic[J].IEEE Transactions on Vehicular Technology,2015,65(5):3657-3675.
[12]LIU J,KRISHNAMACHARI B,ZHOU S,et al.Deepnap:Data-driven base station sleeping operations through deep reinforcement learning[J].IEEE Internet of Things Journal,2018,5(6):4273-4282.
[13]GHADIMI E,CALABRESE F D,PETERS G,et al.A reinforcement learning approach to power control and rate adaptation in cellular networks[C]//2017 IEEE International Conference on Communications (ICC).IEEE,2017:1-7.
[14]WANG L,PETERS G,LIANG Y C,et al.Intelligent User-Centric Networks:Learning-Based Downlink CoMP Region Brea-thing[J].IEEE Transactions on Vehicular Technology,2020,69(5):5583-5597.
[15]LIU Q,SHI J.Base station sleep and spectrum allocation in he-terogeneous ultra-dense networks[J].Wireless Personal Communications,2018,98(4):3611-3627.
[16]NIU Z,GUO X,ZHOU S,et al.Characterizing energy-delaytradeoff in hyper-cellular networks with base station sleeping control[J].IEEE Journal on Selected Areas in Communications,2015,33(4):641-650.
[17]CHEN X,WU J,CAI Y,et al.Energy-efficiency oriented traffic offloading in wireless networks:a brief survey and a learning approach for heterogeneous cellular networks[J].IEEE Journal on Selected Areas in Communications,2015,33(4):627-640.
[18]FENG M,MAO S,JIANG T.Boost:Base station on off swi-tching strategy for energy efficient massive mimo hetnets[C]// IEEE INFOCOM 2016 The 35th Annual IEEE International Conference on Computer Communications.IEEE,2016:1-9.
[19]SAMARAKOON S,BENNIS M,SAAD W,et al.Opportunistic sleep mode strategies in wireless small cell networks[C]//2014 IEEE International Conference on Communications(ICC).IEEE,2014:2707-2712.
[20]SALEM F E,ALTMAN Z,GATI A,et al.Reinforcement lear-ning approach for advanced sleep modes management in 5G networks[C]//2018 IEEE 88th Vehicular Technology Conference (VTC-Fall).IEEE,2018:1-5.
[21]EL-AMINE A,ITURRALDE M,HASSAN H A H,et al.A distributed Q-Learning approach for adaptive sleep modes in 5G networks[C]//2019 IEEE Wireless Communications and Networking Conference (WCNC).IEEE,2019:1-6.
[22]ZHANG Y T.A Deep Reinforcement Learning based Dynamic C-RAN Resource Allocation Method[J].Journal of Chinese Computer Systems,2021,42(1):132-136.
[1] 代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法[J]. 计算机科学, 2021, 48(9): 235-243.
[2] 成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制[J]. 计算机科学, 2021, 48(9): 271-277.
[3] 周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现[J]. 计算机科学, 2021, 48(7): 40-46.
[4] 李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统[J]. 计算机科学, 2021, 48(7): 47-54.
[5] 梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展[J]. 计算机科学, 2021, 48(7): 316-323.
[6] 王英恺, 王青山. 能量收集无线通信系统中基于强化学习的能量分配策略[J]. 计算机科学, 2021, 48(7): 333-339.
[7] 范家宽, 王皓月, 赵生宇, 周添一, 王伟. 数据驱动的开源贡献度量化评估与持续优化方法[J]. 计算机科学, 2021, 48(5): 45-50.
[8] 范艳芳, 袁爽, 蔡英, 陈若愚. 车载边缘计算中基于深度强化学习的协同计算卸载方案[J]. 计算机科学, 2021, 48(5): 270-276.
[9] 黄志勇, 吴昊霖, 王壮, 李辉. 基于平均神经网络参数的DQN算法[J]. 计算机科学, 2021, 48(4): 223-228.
[10] 李丽, 郑嘉利, 罗文聪, 全艺璇. 基于近端策略优化的RFID室内定位算法[J]. 计算机科学, 2021, 48(4): 274-281.
[11] 秦智慧, 李宁, 刘晓彤, 刘秀磊, 佟强, 刘旭红. 无模型强化学习研究综述[J]. 计算机科学, 2021, 48(3): 180-187.
[12] 程云飞, 田红心, 刘祖军. NOMA系统异构网络中联合用户关联和功率控制协同优化[J]. 计算机科学, 2021, 48(3): 269-274.
[13] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1-18.
[14] 高雅卓, 刘亚群, 张国敏, 邢长友, 王秀磊. 基于多阶段博弈的虚拟化蜜罐动态部署机制[J]. 计算机科学, 2021, 48(10): 294-300.
[15] 杨惟轶,白辰甲,蔡超,赵英男,刘鹏. 深度强化学习中稀疏奖励问题研究综述[J]. 计算机科学, 2020, 47(3): 182-191.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 周文辉, 石敏, 朱登明, 周军. 基于残差注意力网络的地震数据超分辨率方法[J]. 计算机科学, 2021, 48(8): 24 -31 .
[2] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[3] 刘天星, 李伟, 许铮, 张立华, 戚骁亚, 甘中学. 面向高维连续行动空间的蒙特卡罗树搜索算法[J]. 计算机科学, 2021, 48(10): 30 -36 .
[4] 张建行, 刘全. 基于情节经验回放的深度确定性策略梯度方法[J]. 计算机科学, 2021, 48(10): 37 -43 .
[5] . 目录[J]. 计算机科学, 2021, 48(11): 0 .
[6] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .
[7] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述[J]. 计算机科学, 2021, 48(11): 4 -11 .
[8] 李玉, 段宏岳, 殷昱煜, 高洪皓. 基于区块链的去中心化众包技术综述[J]. 计算机科学, 2021, 48(11): 12 -27 .
[9] 陈先来, 赵晓宇, 曾工棉, 安莹. 基于区块链的患者在线交流模型[J]. 计算机科学, 2021, 48(11): 28 -35 .
[10] 廉文娟, 赵朵朵, 范修斌, 耿玉年, 范新桐. 基于认证及区块链的CFL_BLP_BC模型[J]. 计算机科学, 2021, 48(11): 36 -45 .