计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240500054-10.doi: 10.11896/jsjkx.240500054
朴明杰1, 张冬冬1, 卢鹄2, 李汝鹏2, 葛小丽2
PIAO Mingjie1, ZHANG Dongdong1, LU Hu2, LI Rupeng2, GE Xiaoli2
摘要: 有效的供应链库存管理对诸如民用飞机和汽车制造等大规模制造业至关重要,它能确保高效的生产运作。通常情况下,主制造商制定年度库存管理计划,并根据实际生产进度,在某些物料接近临界库存水平时与供应商进行联系。但实际生产情况的变化可能会导致年度库存管理计划的改变,因此根据实际生产情况和库存水平对未来物料采购情况进行决策相对更为灵活与高效。近年来,许多研究者关注采用强化学习方法来研究库存管理问题。当前的方法在解决具有多节点多物料模式的民用飞机制造供应链库存管理问题时虽然能够一定程度上提供高效管理,但是带来了较高的复杂度。为解决这一问题,将问题形式化为一个部分可观察马尔可夫决策过程模型,并提出了一种基于改进Transformer的多智能体供应链库存管理方法。该方法基于多智能体强化学习序列决策的本质,将多智能体强化学习问题转化为编码器-解码器架构的序列建模问题,从逻辑上降低算法的复杂度。实验结果表明,相较于现有的基于强化学习的方法,所提方法在保持性能相近的基础上,于复杂度方面约有90%的改善。
中图分类号:
[1]LU H B,XUE C X.Research on the Construction Method of Application Mechanism for Supply Chain Inventory Management from the ERP Perspective [J].China Storage & Transport,2023(5):195-196. [2]FENG B Q,CAO L T,WANG S Y.Research on supply management of front stores based on storage theory [J].Manufacturing Automation,2021,43(10):127-130. [3]LEE H L,BILLINGTON C.Material management in decentrali-zed supply chains [J].Operations Research,1993,41(5):835-47. [4]NAHMIAS S,SMITH S A.Optimizing inventory levels in atwo-echelon retailer system with partial lost sales [J].Management Science,1994,40(5):582-596. [5]CARO F,GALLIEN J.Inventory management of a fast-fashion retail network [J].Operations Research,2010,58(2):257-273. [6]COELHO L C,LAPORTE G.Optimal joint replenishment,delivery and inventory management policies for perishable pr-oducts [J].Computers & Operations Research,2014,47:42-52. [7]SMITH S A,AGRAWAL N.Management of multi-item retail inventory systems with demand substitution [J].Operations Research,2000,48(1):50-64. [8]ÅSTRÖM K J,WITTENMARK B.Adaptive control [M].Courier Corporation,2013. [9]CAMACHO E F,ALBA C B.Model predictive control [M].Springer Science & Business Media,2013. [10]SCHAAL S.Is imitation learning the route to humanoid robots? [J].Trends in cognitive sciences,1999,3(6):233-242. [11]GARCIA C,IBEAS A,HERRERA J,et al.Inventory control for the supply chain:An adaptive control approach based on the identification of the lead-time [J].Omega,2012,40(3):314-327. [12]BRAUN M W,RIVERA D E,FLORES M,et al.A model predictive control framework for robust management of multi-pro-duct,multi-echelon demand networks [J].Annual Reviews in Control,2003,27(2):229-245. [13]BANIWAL V,KAYAL C,SHAH D,etal.An imitation learning approach for computing anticipatory picking decisions in retail distribution centres[C]//Proceedings of the 2019 American Control Conference(ACC).IEEE,2019. [14]GIANNOCCARO I,PONTRANDOLFO P.Inventory manage-ment in supply chains:a reinforcement learning approach [J].International Journal of Production Economics,2002,78(2):153-161. [15]KARA A,DOGAN I.Reinforcement learning approaches forspecifying ordering policies of perishable inventory systems [J].Expert Systems with Applications,2018,91:150-158. [16]NURKASANAH I.Reinforcement learning approach for effi-cient inventory policy in multi-echelon supply chain under various assumptions and constraints [J].Journal of Information Systems Engineering and Business Intelligence,2021,7(2):138-148. [17]JIANG C,SHENG Z.Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system [J].Expert Systems with Applications,2009,36(3):6520-6526. [18]BARAT S,KHADILKAR H,MEISHERI H,et al.Actor based simulation for closed loop control of supply chain using reinforcement learning[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems.2019. [19]ALVES J C,SILVA D M D,MATEUS G R.Applying and comparing policy gradient methods to multi-echelon supply chains with uncertain demands and lead times [C]//Proceedings of the International Conference on Artificial Intelligence and Soft Computing.Springer,2020. [20]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning [J].arXiv:1509.02971,2015. [21]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the International conference on machine learning.PMLR,2018. [22]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms [J].arXiv:1707.06347,2017. [23]SULTANA N N,MEISHERI H,BANIWAL V,et al.Rein-forcement learning for multi-product multi-node inventory management in supply chains [J].arXiv:2006.04037,2020. [24]ALVES J C,MATEUS G R.Deep reinforcement learning and optimization approach for multi-echelon supply chain with uncertain demands[C]//Proceedings of the International Conference on Computational Logistics.Springer,2020. [25]WANG H,TAO J,PENG T,et al.Dynamic inventory replenishment strategy for aerospace manufacturing supply chain:combining reinforcement learning and multi-agent simulation[J].International Journal of Production Research,2022,60(13):4117-4136. [26]PIAO M,ZHANG D,LU H,et al.A Supply Chain Inventory Management Method for Civil Aircraft Manufacturing Based on Multi-Agent Reinforcement Learning [J].Applied Sciences,2023,13(13):7510. |
|