计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 263-268.doi: 10.11896/jsjkx.210300155
李鹏1,2, 易修文2, 齐德康1,2, 段哲文2,3, 李天瑞1
LI Peng1,2, YI Xiu-wen2, QI De-kang1,2, DUAN Zhe-wen2,3, LI Tian-rui1
摘要: 在中国北方,冬季楼宇集中供暖采用的策略通常为气候补偿器,但是该策略严重依赖人工经验,调节相对粗放,如何优化供热控制策略对于保持楼宇室温的稳定舒适十分重要。对此,提出了一种基于深度学习的供热策略优化方法,通过学习历史真实数据信息从而对原始控制策略进行优化。首先以学习室内温度变化的热力学规律为目标,提出了一种深度多时差分网络MTDN(Multiple Time Difference Network)来对下一时刻的室温进行预测,该网络不仅准确率高,而且符合物理规律;然后将MTDN当成模拟器,以表征人体热反应的评价指标作为相关奖励项,使用基于最大熵强化学习思想的SAC(Soft Actor Critic)算法作为策略优化器与之交互训练,从而学习到一个稳定优秀的供热控制策略;最后基于天津某个换热站的真实数据,设计相关实验分别对模拟器预测能力和策略优化器策略控制能力进行评估。验证得出:相比其他类型的预测模拟器,该模拟器不仅预测精度高,并且符合物理规律;同时,相比原始策略,该策略优化器所学的策略在随机采样的多个时段内均可以保证室内温度更加稳定舒适。
中图分类号:
[1] CHENG L.Application of climate compensator in heating system[J].Building Science,2010,26(10):42-46. [2] CRAWLEY D B,LAWRIE L K,WINKELMANN F C,et al.EnergyPlus:creating a new-generation building energy simulation program[J].Energy and buildings,2001,33(4):319-331. [3] LI Y,ANG K H,CHONG G C Y.PID control system analysis and design[J].IEEE Control Systems Magazine,2006,26(1):32-41. [4] HINTON G E,SALAKHUTDINOV R R.Reducing the dimen-sionality of data with neural networks[J].Science,2006,313(5786):504-507. [5] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [6] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [7] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015. [8] HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning.PMLR,2018:1861-1870. [9] DEAR R D,BRAGER G.Developing an adaptive model of thermal comfort and preference[J].Ashrae Trans,1998,104(1):73-81. [10] FAZLOLLAHI S,BECKER G,MARECHAL F.Multi-objec-tives,multi-period optimization of district energy systems:III.Distribution networks[J].Computers & Chemical Engineering,2014,66(4):82-97. [11] LI S Q,JIANG Z J.Heating load forecasting model based on Neural Network[J].District Heating,2018,(4):42-46. [12] BAI H,WANG Y,FAN W Q,et al.Backwater Temperature Control System of Heat Network Based on PID[J].District Heating,2019,(3):132-136. [13] WU J X,ZHAO T,LIU L S,et al.Research on Heat-exchange Station Operation Based on Flowmaster Simulation[J].District Heating,2019,(4):144-150. [14] LI Q,HAN B C.Optimal Control of Primary Side of Thermal Power Station Based on Deep Deterministic Policy Gradient[J].Science Technology and Engineering,2019,19(29):193-200. [15] ZHANG C,KUPPANNAGARI S R,KANNAN R,et al.Buil-ding HVAC scheduling using reinforcement learning via neural network based model approximation[C]//Proceedings of the 6th ACM International Conference on Systems for Energy-efficient Buildings,Cities,and Transportation.2019:287-296. [16] ZHANG Z,CHONG A,PAN Y,et al.Whole building energy model for HVAC optimal control:A practical framework based on deep reinforcement learning[J].Energy and Buildings,2019,199:472-490. [17] WEI T,WANG Y,ZHU Q.Deep reinforcement learning forbuilding HVAC control[C]//Proceedings of the 54th Annual Design Automation Conference 2017.2017:1-6. [18] BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016. [19] TARTARINI F,SCHIAVON S.pythermalcomfort:A Pythonpackage for thermal comfort research[J].SoftwareX,2020,12:100578. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[3] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[4] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[5] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[6] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[7] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[8] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[9] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[10] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[11] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[12] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[13] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[14] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[15] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
|