一种基于修正机制和强化学习的作业车间调度问题的优化算法

doi:10.11896/jsjkx.220900112

计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 274-282.doi: 10.11896/jsjkx.220900112

一种基于修正机制和强化学习的作业车间调度问题的优化算法

苗宽¹, 李崇寿^1,2

1 西南交通大学计算机与人工智能学院成都 610097
2 西南交通大学利兹学院成都 610097

收稿日期:2022-09-13 修回日期:2022-11-09 出版日期:2023-06-15 发布日期:2023-06-06
通讯作者: 李崇寿(lics@swjtu.edu.cn)
作者简介:(1084083934@qq.com)
基金资助:
国家自然科学基金(62202395);四川省自然科学基金(2022NSFSC0930);中央高校基本科研业务费专项资金(2682022CX067);四川省重点研发项目(2022YFG0028)

Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning

MIAO Kuan¹, LI Chongshou^1,2

1 School of Artificial Intelligence and Computing,Southwest Jiaotong University,Chengdu 610097,China
2 SWJTU-Leeds Joint School,Southwest Jiaotong University,Chengdu 610097,China

Received:2022-09-13 Revised:2022-11-09 Online:2023-06-15 Published:2023-06-06
About author:MIAO Kuan,born in 1998,postgra-duate,is a member of China Computer Federation.His main research interests include reinforcement learning and combinatorial optimisation.LI Chongshou,born in 1988,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include multi-scale data intelligence,AI and applied optimization.
Supported by:
National Natural Science Foundation of China(62202395),Natural Science Foundation of Sichuan Province,China(2022NSFSC0930),Fundamental Research Funds for the Central Universities of Ministry of Education of China(2682022CX067) and Key R & D Project of Sichuan Province,China(2022YFG0028).

摘要/Abstract

摘要： 近年来,使用深度强化学习解决作业车间调度问题的研究主要集中于构造法,通过将作业车间调度问题视为顺序决策问题,逐步选择调度节点从而得到完整的解。尽管这种算法思想已经取得了不小的成果,但仍面临奖励构造困难、解决方案质量不高的问题,因此这一方法的发展受到制约。针对这些问题,设计了一种基于图神经网络和近端策略优化算法的强化学习构造框架。同时,针对因训练与测试数据分布不一致而带来的次优解问题,还设计了一种修正交换算子,以保证解的质量。最后,为了证明算法的有效性,在公开数据集和生成的数据集上进行了实验。实验结果表明,所提算法在中小规模实例上的结果优于目前最好的强化学习框架,不仅充分发挥了构造式强化学习框架求解迅速的优势,还通过修正机制有效缓解了次优选择问题,缩短了实例的最大完成时间。

关键词: 调度, 作业车间调度问题, 强化学习, 修正搜索算法

Abstract: In recent years,research on using deep reinforcement learning to solve job shop scheduling problems has concentrated on construction techniques,which model the scheduling problem as sequential choice problems and gradually select scheduling nodes for a complete solution.Although this algorithmic theory has produced impressive results,it still suffers from complicated reward formulation and poor solution quality,which prevents its future development.In this study,we design a reinforcement learning construction framework based on graphical neural networks and proximal policy optimisation algorithms,and an innovative and efficient search correction mechanism with a modified swap operator is proposed to enhance the solution quality.It searches the area around a known solution using a Monte Carlo tree,correcting the issue of suboptimal solution selection caused by the discrepancy between training and testing data.The proposed algorithm is comprehensively investigated on public and synthetic datasets.Experimental results demonstrate that the algorithm outperforms the state-of-the-art reinforcement learning framework on both small and medium-sized examples.It not only fully exploits the advantages of rapid solution of constructive reinforcement learning framework,but also effectively corrects the sub-optimal choice through the correction mechanism,reducing the maximum completion time in worst cases.

Key words: Scheduling, Job shop scheduling problems, Reinforcement learning, Modified search algorithms

中图分类号:

TP181

苗宽, 李崇寿. 一种基于修正机制和强化学习的作业车间调度问题的优化算法[J]. 计算机科学, 2023, 50(6): 274-282. https://doi.org/10.11896/jsjkx.220900112

MIAO Kuan, LI Chongshou. Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning[J]. Computer Science, 2023, 50(6): 274-282. https://doi.org/10.11896/jsjkx.220900112

参考文献

[1]GLOVER F,LANGUNA M.Tabu search[M]//Handbook of combinatorial optimization.Boston:Springer,1998:2093-2229.
[2]SILVER D,HUBERT T,SCHRITTWIESER J,et al.A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J].Science,2018,362(6419):1140-1144.
[3]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[4]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 with large scale deep reinforcement learning[J].arXiv:1912.06680,2019.
[5]GAREY M R,JOHNSON D S,SETHI R.The complexity of flowshop and jobshop scheduling[J].Mathematics of Operations Research,1976,1(2):117-129.
[6]GIFFLER B,THOMPSON G L.Algorithms for solving production-scheduling problems[J].Operations Research,1960,8(4):487-503.
[7]HAUPT R.A survey of priority rule-based scheduling[J].Operations-Research-Spektrum,1989,11(1):3-16.
[8]GEIGER C D,UZSOY R,AYTUGˇ H.Rapid modeling and discovery of priority dispatching rules:An autonomous learning approach[J].Journal of Scheduling,2006,9(1):7-34.
[9]ZHANG C,SONG W,CAO Z,et al.Learning to dispatch for job shop scheduling via deep reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:1621-1632.
[10]PARK J,CHUN J,KIM S H,et al.Learning to schedule job-shop problems:representation and policy learning using graph neural network and reinforcement learning[J].International Journal of Production Research,2021,59(11):3360-3377.
[11]PARK J,BAKHTIYAR S,PARK J.Schedu-leNet:Learn tosolve multi-agent scheduling problems with reinforcement lear-ning[J].arXiv:2106.03051,2021.
[12]TASSEL P,GEBSER M,SCHEKOTIHIN K.A reinforcement learning environment for job-shop scheduling[J].arXiv:2104.03760,2021.
[13]MONACI M,AGASUCCI V,GRANI G.An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem[J].arXiv:2110.09076,2021.
[14]BOGYRBAYEVA A,MERALIYEV M,MUS-TAKHOV T,et al.Learning to Solve Vehicle Routing Problems:A Survey[J].arXiv:2205.02453,2022.
[15]XIN L,SONG W,CAO Z,et al.Generative Adversarial Training for Neural Combinatorial Optimization Models[J/OL].https://xs2.zidianzhan.net/scholar?hl=zh-CN&as_sdt=0%2C5&q=Generative+Adversarial+Training+for+Neural+Combinatorial+Optimization+Models&btnG=.
[16]KOOL W,VAN HOOF H,WELLING M.Att-ention,learn to solve routing problems![J].arXiv:1803.08475,2018.
[17]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[18]BENGIO Y,LODI A,PROUVOST A.Machine learning forcombinatorial optimization:a methodological tour d'horizon[J].European Journal of Operational Research,2021,290(2):405-421.
[19]JOSHI C K,LAURENT T,BRESSON X.An efficient graphconvolutional network technique for the travelling salesman problem[J].arXiv:1906.01227,2019.
[20]LU H,ZHANG X,YANG S.A learning-based iterative method for solving vehicle routing problems[C]//International Confe-rence on Learning Representations.2019.
[21]ROY A,SAFFAR M,VASWANI A,et al.Efficient content-based sparse attention with routing transformers[J].Transactions of the Association for Computational Linguistics,2021,9:53-68.
[22]ZHOU J,CUI G,HU S,et al.Graph neural networks:A review of methods and applications[J].AI Open,2020,1:57-81.
[23]DWIVEDI V P,JOSHI C K,LAURENT T,et al.Benchmarking graph neural networks[J].arXiv:2003.00982,2020.
[24]KHALIL E,DAI H,ZHANG Y,et al.Learning combinatorial optimization algorithms over graphs[J].Advances inNeural Information Processing Systems,2017,30:6348-6358.
[25]XU K,HU W,LESKOVEC J,et al.How powerful are graphneural networks?[J].arXiv:1810.00826,2018.
[26]PERRON L,FURNON V.Or-tools.UR-L [OL].https://de-velopers.google.com/optimization.
[27]ZENG Y,LIAO Z,DAI Y,et al.Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mecha-nism[J].arXiv:2201.00548,2022.
[28]MONACI M,AGASUCCI V,GRANI G.An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem[J].arXiv:2110.09076,2021.
[29]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[30]NI F,HAO J,LU J,et al.A Multi-Graph Attributed Reinforcement Learning Based Optimization Algorithm for Large-Scale Hybrid Flow Shop Scheduling Problem[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:3441-3451.
[31]NAWAZ M,ENSCORE JR E E,HAM I.A heuristic algorithm for the m-machine,n-job flow-shop sequencing problem[J].Omega,1983,11(1):91-95.
[32]OREN J,ROSS C,LEFAROV M,et al.SOLO:search online,learn offline for combinatorial optimization problems[C]//Proceedings of the International Symposium on Combinatorial Search.2021:97-105.
[33]WANG R,HUA Z,LIU G,et al.A bi-level framework forlearning to solve combinatorial optimization on graphs[J].Advances in Neural Information Processing Systems,2021,34:21453-21466.
[34]XIN L,SONG W,CAO Z,et al.Step-wise deep learning models for solving routing problems[J].IEEE Transactions on Indus-trial Informatics,2020,17(7):4861-4871.
[35]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[36]VELICˇKOVIC＇ P,CUCURULL G,CASA-NOVA A,et al.Graph attention networks[J].arXiv:1710.10903,2017.
[37]HE K,GKIOXARI G,DOLLÁR P,et al.Mask r-cnn[C]//Proceedings of the IEEEInternational Conference on Computer Vision.2017:2961-2969.
[38]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897.
[39]ENGSTROM L,ILYAS A,SANTURKAR S,et al.Implementation matters in deep policy gradients:A case study on PPO and TRPO[J].arXiv:2005.12729,2020.
[40]BROWNE C B,POWLEY E,WHITEHOUSE D,et al.A survey of monte carlo tree search methods[J].IEEE Transactions on Computational Intelligence and AI in games,2012,4(1):1-43.
[41]COULOM R.Efficient selectivity and backup operators inMonte-Carlo tree search[C]//International Conference on Computers and Games.Berlin:Springer,2006:72-83.
[42]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[43]SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550(7676):354-359.
[44]TAILLARD E.Benchmarks for basic scheduling problems[J].European Jurnal of Operational Research,1993,64(2):278-285.
[45]ADAMS J,BALAS E,ZAWACK D.The shifting bottleneckprocedure for job shop scheduling[J].Management Science,1988,34(3):391-401.
[46]FISHER H.Probabilistic learning combinations of local job-shop scheduling rules[J/OL].https://xs2.zidianzhan.net/scholar?hl=zh-CN&as_sdt=0%2C5&q=Probabilistic+learning+combinations+of+local+job-shop+scheduling+rules&btnG=#d=gs_cit&t=1681371135544&u=%2Fscholar%3Fq%3Dinfo%3ATfo-0odqynIJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Dzh-CN.
[47]LAWRENCE S.Resouce constrained project scheduling:An experimental investigation of heuristic scheduling techniques (Supplem-ent)[J/OL].https://xs2.zidianzhan.net/scholar?hl=zh-CN&as_sdt=0%2C5&q=Resouce+constrained+project+scheduling%3A+An+experimental+investigation+of+heuristic+scheduling+techniques+&btnG=#d=gs_cit&t=1681373202282&u=%2Fscholar%3Fq%3Dinfo%3AXvOT-Lo5cCigJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Dzh-CN.
[48]STORER R H,WU S D,VACCARI R.New search spaces for sequencing problems with application to job shop scheduling[J].Manag-ement Science,1992,38(10):1495-1509.
[49]YAMADA T,NAKANO R.A genetic algorithm applicable to large-scale job-shop problems[C]//PPSN.1992:281-290.
[50]APPLEGATE D,COOK W.A computational study of the job-shop scheduling problem[J].ORSA Journal on Computing,1991,3(2):149-156.
[51]FEY M,LENSSEN J E.Fast Graph Representation Learning with PyTorch Geometric[J].arXiv:1903.02428,2019.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

一种基于修正机制和强化学习的作业车间调度问题的优化算法

Optimization Algorithms for Job Shop Scheduling Problems Based on Correction Mechanisms and Reinforcement Learning

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0