计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 263-268.doi: 10.11896/jsjkx.210300155

• 人工智能 • 上一篇    下一篇

一种基于深度学习的供热策略优化方法

李鹏1,2, 易修文2, 齐德康1,2, 段哲文2,3, 李天瑞1   

  1. 1 西南交通大学计算机与人工智能学院 成都 611756;
    2 北京京东智能城市大数据研究院 北京 100176;
    3 西安电子科技大学计算机科学与技术学院 西安 710071
  • 收稿日期:2021-03-15 修回日期:2021-07-25 发布日期:2022-04-01
  • 通讯作者: 易修文(yixiuwen@jd.com)
  • 作者简介:(lipengsx@my.swjtu.edu.cn)
  • 基金资助:
    国家重点研发计划(2019YFB2101801); 国家自然科学基金面上项目(61773324)

Heating Strategy Optimization Method Based on Deep Learning

LI Peng1,2, YI Xiu-wen2, QI De-kang1,2, DUAN Zhe-wen2,3, LI Tian-rui1   

  1. 1 School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756, China;
    2 JD Intelligent Cities Research, Beijing 100176, China;
    3 School of Computer Science and Technology, Xidian University, Xi'an 710071, China
  • Received:2021-03-15 Revised:2021-07-25 Published:2022-04-01
  • About author:LI Peng,born in 1996,postgraduate.His main research interests include deep learning and deep reinforcement learning.YI Xiu-wen,born in 1991,Ph.D,data scientist,researcher,is a member of China Computer Federation.His main research interests include spatio-temporal data mining and deep learning.
  • Supported by:
    This work was supported by the National Key R&D Program of China(2019YFB2101801) and National Natural Science Foundation of China(61773324).

摘要: 在中国北方,冬季楼宇集中供暖采用的策略通常为气候补偿器,但是该策略严重依赖人工经验,调节相对粗放,如何优化供热控制策略对于保持楼宇室温的稳定舒适十分重要。对此,提出了一种基于深度学习的供热策略优化方法,通过学习历史真实数据信息从而对原始控制策略进行优化。首先以学习室内温度变化的热力学规律为目标,提出了一种深度多时差分网络MTDN(Multiple Time Difference Network)来对下一时刻的室温进行预测,该网络不仅准确率高,而且符合物理规律;然后将MTDN当成模拟器,以表征人体热反应的评价指标作为相关奖励项,使用基于最大熵强化学习思想的SAC(Soft Actor Critic)算法作为策略优化器与之交互训练,从而学习到一个稳定优秀的供热控制策略;最后基于天津某个换热站的真实数据,设计相关实验分别对模拟器预测能力和策略优化器策略控制能力进行评估。验证得出:相比其他类型的预测模拟器,该模拟器不仅预测精度高,并且符合物理规律;同时,相比原始策略,该策略优化器所学的策略在随机采样的多个时段内均可以保证室内温度更加稳定舒适。

关键词: 城市计算, 供热优化, 集中供暖, 深度强化学习, 深度学习

Abstract: Typically, the strategy of central heating for buildings in winter is climate compensator.However, this strategy heavily relies on manual experience with a relatively simple regulation.Therefore, how to optimize the heating control strategy is very important to keep the indoor temperature stable and comfortable.For this task, this paper proposes a heating strategy optimization method based on deep learning and deep reinforcement learning, which can optimize the original control strategy based on real historical data.The paper first develops a deep MTDN (Multiple Time Difference Network) as the simulator to predict the next time slot's room temperature.By learning the thermodynamic law of indoor temperature change, the network has high accuracy and confirms the physical laws.After that, the SAC (Soft Actor-Critic) algorithm based on maximum entropy reinforcement learning is employed as the strategy optimizer to interact with the simulator.Here, we use the evaluation index of the human body's thermal response as the reward to train and optimize the heating control strategy.Based on the real data of a heat exchange station in Tianjin, we evaluate the predictive ability of the simulator and the control ability of the strategy optimizer, respectively.The results verify that, compared with other types of prediction simulators, this simulator not only has high prediction accuracy but also conforms to physical laws.At the same time, compared with the original strategy, the strategy learned by the strategy optimizer can ensure that the indoor temperature is more stable and comfortable in multiple time periods of random sampling.

Key words: Central heating, Deep learning, Deep reinforcement learning, Heating optimization, Urban computing

中图分类号: 

  • TP399
[1] CHENG L.Application of climate compensator in heating system[J].Building Science,2010,26(10):42-46.
[2] CRAWLEY D B,LAWRIE L K,WINKELMANN F C,et al.EnergyPlus:creating a new-generation building energy simulation program[J].Energy and buildings,2001,33(4):319-331.
[3] LI Y,ANG K H,CHONG G C Y.PID control system analysis and design[J].IEEE Control Systems Magazine,2006,26(1):32-41.
[4] HINTON G E,SALAKHUTDINOV R R.Reducing the dimen-sionality of data with neural networks[J].Science,2006,313(5786):504-507.
[5] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[6] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[7] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning[J].arXiv:1509.02971,2015.
[8] HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning.PMLR,2018:1861-1870.
[9] DEAR R D,BRAGER G.Developing an adaptive model of thermal comfort and preference[J].Ashrae Trans,1998,104(1):73-81.
[10] FAZLOLLAHI S,BECKER G,MARECHAL F.Multi-objec-tives,multi-period optimization of district energy systems:III.Distribution networks[J].Computers & Chemical Engineering,2014,66(4):82-97.
[11] LI S Q,JIANG Z J.Heating load forecasting model based on Neural Network[J].District Heating,2018,(4):42-46.
[12] BAI H,WANG Y,FAN W Q,et al.Backwater Temperature Control System of Heat Network Based on PID[J].District Heating,2019,(3):132-136.
[13] WU J X,ZHAO T,LIU L S,et al.Research on Heat-exchange Station Operation Based on Flowmaster Simulation[J].District Heating,2019,(4):144-150.
[14] LI Q,HAN B C.Optimal Control of Primary Side of Thermal Power Station Based on Deep Deterministic Policy Gradient[J].Science Technology and Engineering,2019,19(29):193-200.
[15] ZHANG C,KUPPANNAGARI S R,KANNAN R,et al.Buil-ding HVAC scheduling using reinforcement learning via neural network based model approximation[C]//Proceedings of the 6th ACM International Conference on Systems for Energy-efficient Buildings,Cities,and Transportation.2019:287-296.
[16] ZHANG Z,CHONG A,PAN Y,et al.Whole building energy model for HVAC optimal control:A practical framework based on deep reinforcement learning[J].Energy and Buildings,2019,199:472-490.
[17] WEI T,WANG Y,ZHU Q.Deep reinforcement learning forbuilding HVAC control[C]//Proceedings of the 54th Annual Design Automation Conference 2017.2017:1-6.
[18] BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016.
[19] TARTARINI F,SCHIAVON S.pythermalcomfort:A Pythonpackage for thermal comfort research[J].SoftwareX,2020,12:100578.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[3] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[5] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[6] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[10] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[11] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[12] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[13] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[14] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[15] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!