一种基于修正机制和强化学习的作业车间调度问题的优化算法

doi:10.11896/jsjkx.220900112

Abstract

Abstract: In recent years,research on using deep reinforcement learning to solve job shop scheduling problems has concentrated on construction techniques,which model the scheduling problem as sequential choice problems and gradually select scheduling nodes for a complete solution.Although this algorithmic theory has produced impressive results,it still suffers from complicated reward formulation and poor solution quality,which prevents its future development.In this study,we design a reinforcement learning construction framework based on graphical neural networks and proximal policy optimisation algorithms,and an innovative and efficient search correction mechanism with a modified swap operator is proposed to enhance the solution quality.It searches the area around a known solution using a Monte Carlo tree,correcting the issue of suboptimal solution selection caused by the discrepancy between training and testing data.The proposed algorithm is comprehensively investigated on public and synthetic datasets.Experimental results demonstrate that the algorithm outperforms the state-of-the-art reinforcement learning framework on both small and medium-sized examples.It not only fully exploits the advantages of rapid solution of constructive reinforcement learning framework,but also effectively corrects the sub-optimal choice through the correction mechanism,reducing the maximum completion time in worst cases.

Key words: Scheduling, Job shop scheduling problems, Reinforcement learning, Modified search algorithms

CLC Number:

TP181

MIAO Kuan, LI Chongshou. Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning[J].Computer Science, 2023, 50(6): 274-282.

References

[1]GLOVER F,LANGUNA M.Tabu search[M]//Handbook of combinatorial optimization.Boston:Springer,1998:2093-2229.
[2]SILVER D,HUBERT T,SCHRITTWIESER J,et al.A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J].Science,2018,362(6419):1140-1144.
[3]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[4]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 with large scale deep reinforcement learning[J].arXiv:1912.06680,2019.
[5]GAREY M R,JOHNSON D S,SETHI R.The complexity of flowshop and jobshop scheduling[J].Mathematics of Operations Research,1976,1(2):117-129.
[6]GIFFLER B,THOMPSON G L.Algorithms for solving production-scheduling problems[J].Operations Research,1960,8(4):487-503.
[7]HAUPT R.A survey of priority rule-based scheduling[J].Operations-Research-Spektrum,1989,11(1):3-16.
[8]GEIGER C D,UZSOY R,AYTUGˇ H.Rapid modeling and discovery of priority dispatching rules:An autonomous learning approach[J].Journal of Scheduling,2006,9(1):7-34.
[9]ZHANG C,SONG W,CAO Z,et al.Learning to dispatch for job shop scheduling via deep reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:1621-1632.
[10]PARK J,CHUN J,KIM S H,et al.Learning to schedule job-shop problems:representation and policy learning using graph neural network and reinforcement learning[J].International Journal of Production Research,2021,59(11):3360-3377.
[11]PARK J,BAKHTIYAR S,PARK J.Schedu-leNet:Learn tosolve multi-agent scheduling problems with reinforcement lear-ning[J].arXiv:2106.03051,2021.
[12]TASSEL P,GEBSER M,SCHEKOTIHIN K.A reinforcement learning environment for job-shop scheduling[J].arXiv:2104.03760,2021.
[13]MONACI M,AGASUCCI V,GRANI G.An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem[J].arXiv:2110.09076,2021.
[14]BOGYRBAYEVA A,MERALIYEV M,MUS-TAKHOV T,et al.Learning to Solve Vehicle Routing Problems:A Survey[J].arXiv:2205.02453,2022.
[15]XIN L,SONG W,CAO Z,et al.Generative Adversarial Training for Neural Combinatorial Optimization Models[J/OL].https://xs2.zidianzhan.net/scholar?hl=zh-CN&as_sdt=0%2C5&q=Generative+Adversarial+Training+for+Neural+Combinatorial+Optimization+Models&btnG=.
[16]KOOL W,VAN HOOF H,WELLING M.Att-ention,learn to solve routing problems![J].arXiv:1803.08475,2018.
[17]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[18]BENGIO Y,LODI A,PROUVOST A.Machine learning forcombinatorial optimization:a methodological tour d'horizon[J].European Journal of Operational Research,2021,290(2):405-421.
[19]JOSHI C K,LAURENT T,BRESSON X.An efficient graphconvolutional network technique for the travelling salesman problem[J].arXiv:1906.01227,2019.
[20]LU H,ZHANG X,YANG S.A learning-based iterative method for solving vehicle routing problems[C]//International Confe-rence on Learning Representations.2019.
[21]ROY A,SAFFAR M,VASWANI A,et al.Efficient content-based sparse attention with routing transformers[J].Transactions of the Association for Computational Linguistics,2021,9:53-68.
[22]ZHOU J,CUI G,HU S,et al.Graph neural networks:A review of methods and applications[J].AI Open,2020,1:57-81.
[23]DWIVEDI V P,JOSHI C K,LAURENT T,et al.Benchmarking graph neural networks[J].arXiv:2003.00982,2020.
[24]KHALIL E,DAI H,ZHANG Y,et al.Learning combinatorial optimization algorithms over graphs[J].Advances inNeural Information Processing Systems,2017,30:6348-6358.
[25]XU K,HU W,LESKOVEC J,et al.How powerful are graphneural networks?[J].arXiv:1810.00826,2018.
[26]PERRON L,FURNON V.Or-tools.UR-L [OL].https://de-velopers.google.com/optimization.
[27]ZENG Y,LIAO Z,DAI Y,et al.Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mecha-nism[J].arXiv:2201.00548,2022.
[28]MONACI M,AGASUCCI V,GRANI G.An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem[J].arXiv:2110.09076,2021.
[29]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[30]NI F,HAO J,LU J,et al.A Multi-Graph Attributed Reinforcement Learning Based Optimization Algorithm for Large-Scale Hybrid Flow Shop Scheduling Problem[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:3441-3451.
[31]NAWAZ M,ENSCORE JR E E,HAM I.A heuristic algorithm for the m-machine,n-job flow-shop sequencing problem[J].Omega,1983,11(1):91-95.
[32]OREN J,ROSS C,LEFAROV M,et al.SOLO:search online,learn offline for combinatorial optimization problems[C]//Proceedings of the International Symposium on Combinatorial Search.2021:97-105.
[33]WANG R,HUA Z,LIU G,et al.A bi-level framework forlearning to solve combinatorial optimization on graphs[J].Advances in Neural Information Processing Systems,2021,34:21453-21466.
[34]XIN L,SONG W,CAO Z,et al.Step-wise deep learning models for solving routing problems[J].IEEE Transactions on Indus-trial Informatics,2020,17(7):4861-4871.
[35]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[36]VELICˇKOVIC＇ P,CUCURULL G,CASA-NOVA A,et al.Graph attention networks[J].arXiv:1710.10903,2017.
[37]HE K,GKIOXARI G,DOLLÁR P,et al.Mask r-cnn[C]//Proceedings of the IEEEInternational Conference on Computer Vision.2017:2961-2969.
[38]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897.
[39]ENGSTROM L,ILYAS A,SANTURKAR S,et al.Implementation matters in deep policy gradients:A case study on PPO and TRPO[J].arXiv:2005.12729,2020.
[40]BROWNE C B,POWLEY E,WHITEHOUSE D,et al.A survey of monte carlo tree search methods[J].IEEE Transactions on Computational Intelligence and AI in games,2012,4(1):1-43.
[41]COULOM R.Efficient selectivity and backup operators inMonte-Carlo tree search[C]//International Conference on Computers and Games.Berlin:Springer,2006:72-83.
[42]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[43]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354-359.
[44]TAILLARD E.Benchmarks for basic scheduling problems[J].European Jurnal of Operational Research,1993,64(2):278-285.
[45]ADAMS J,BALAS E,ZAWACK D.The shifting bottleneckprocedure for job shop scheduling[J].Management Science,1988,34(3):391-401.
[46]FISHER H.Probabilistic learning combinations of local job-shop scheduling rules[J/OL].https://xs2.zidianzhan.net/scholar?hl=zh-CN&as_sdt=0%2C5&q=Probabilistic+learning+combinations+of+local+job-shop+scheduling+rules&btnG=#d=gs_cit&t=1681371135544&u=%2Fscholar%3Fq%3Dinfo%3ATfo-0odqynIJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Dzh-CN.
[47]LAWRENCE S.Resouce constrained project scheduling:An experimental investigation of heuristic scheduling techniques (Supplem-ent)[J/OL].https://xs2.zidianzhan.net/scholar?hl=zh-CN&as_sdt=0%2C5&q=Resouce+constrained+project+scheduling%3A+An+experimental+investigation+of+heuristic+scheduling+techniques+&btnG=#d=gs_cit&t=1681373202282&u=%2Fscholar%3Fq%3Dinfo%3AXvOT-Lo5cCigJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Dzh-CN.
[48]STORER R H,WU S D,VACCARI R.New search spaces for sequencing problems with application to job shop scheduling[J].Manag-ement Science,1992,38(10):1495-1509.
[49]YAMADA T,NAKANO R.A genetic algorithm applicable to large-scale job-shop problems[C]//PPSN.1992:281-290.
[50]APPLEGATE D,COOK W.A computational study of the job-shop scheduling problem[J].ORSA Journal on Computing,1991,3(2):149-156.
[51]FEY M,LENSSEN J E.Fast Graph Representation Learning with PyTorch Geometric[J].arXiv:1903.02428,2019.

Related Articles 15

[1]	LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7.
[2]	DENG Guanghong, ZHANG Qiheng. Container-based Scheduling Architecture for Mixed-Criticality Systems [J]. Computer Science, 2023, 50(6A): 220800215-5.
[3]	DENG Shengnan, LUO Taiyu, HUANG Jingcai, REN Yuqing, SONG Wei, SU Chang, LEI Lili, HU Guanghui, XU Hong. Design and Implementation of Natural Gas Intelligent Scheduling Computer Platform System [J]. Computer Science, 2023, 50(6A): 220700258-7.
[4]	SHI Liang, WEN Liangming, LEI Sheng, LI Jianhui. Virtual Machine Consolidation Algorithm Based on Decision Tree and Improved Q-learning by Uniform Distribution [J]. Computer Science, 2023, 50(6): 36-44.
[5]	WANG Zhuang, WANG Pinghui, WANG Bincheng, WU Wenbo, WANG Bin, CONG Pengyu. GPU Shared Scheduling System Under Deep Learning Container Cloud Platform [J]. Computer Science, 2023, 50(6): 86-91.
[6]	WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273.
[7]	ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei. Survey on Knowledge Transfer Method in Deep Reinforcement Learning [J]. Computer Science, 2023, 50(5): 201-216.
[8]	YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171.
[9]	XIE Yongsheng, HUANG Xiangheng, CHEN Ningjiang. Self-balanced Scheduling Strategy for Container Cluster Based on Improved DQN Algorithm [J]. Computer Science, 2023, 50(4): 233-240.
[10]	PAN Jikui, DONG Xinyi, LU Zhenghao, WANG Zijian, SUN Fuquan. Deadline Constrained Scheduling Optimization Algorithm for Workflow in Clouds Using Spot Instance [J]. Computer Science, 2023, 50(4): 257-264.
[11]	XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332.
[12]	Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68.
[13]	WANG Jiwang, SHEN Liwei. Fine-grained Action Allocation and Scheduling Method for Dynamic Heterogeneous Tasks in Multi-robot Environments [J]. Computer Science, 2023, 50(2): 244-253.
[14]	HUANG Yuzhou, WANG Lisong, QIN Xiaolin. Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning [J]. Computer Science, 2023, 50(1): 194-204.
[15]	RONG Huan, QIAN Minfeng, MA Tinghuai, SUN Shengjie. Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration [J]. Computer Science, 2023, 50(1): 243-252.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0