计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240500054-10.doi: 10.11896/jsjkx.240500054

• 人工智能 • 上一篇    下一篇

基于改进Transformer的多智能体供应链库存管理方法

朴明杰1, 张冬冬1, 卢鹄2, 李汝鹏2, 葛小丽2   

  1. 1 同济大学电子与信息工程学院 上海 201804
    2 上海飞机制造有限公司航空制造技术研究所 上海 201324
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 张冬冬(ddzhang@tongji.edu.cn)
  • 作者简介:(2130784@tongji.edu.cn)
  • 基金资助:
    国家重点研发计划课题(2021YFB3301901)

Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer

PIAO Mingjie1, ZHANG Dongdong1, LU Hu2, LI Rupeng2, GE Xiaoli2   

  1. 1 College of Electronic and Information Engineering,Tongji University,Shanghai 201804,China
    2 Aviation Manufacturing Technology Research Institute,Shanghai Aircraft Manufacturing Co.,Ltd.,Shanghai 201324,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:PIAO Mingjie,born in 1999,postgraduate.His main research interests include reinforcement learning and aircraft manufacturing supply chain inventory management.
    ZHANG Dongdong,born in 1977,Ph.D,professor,Ph.D supervisor.Her main research interests include image processing and deep learning.
  • Supported by:
    National Key R&D Program of China(2021YFB3301901).

摘要: 有效的供应链库存管理对诸如民用飞机和汽车制造等大规模制造业至关重要,它能确保高效的生产运作。通常情况下,主制造商制定年度库存管理计划,并根据实际生产进度,在某些物料接近临界库存水平时与供应商进行联系。但实际生产情况的变化可能会导致年度库存管理计划的改变,因此根据实际生产情况和库存水平对未来物料采购情况进行决策相对更为灵活与高效。近年来,许多研究者关注采用强化学习方法来研究库存管理问题。当前的方法在解决具有多节点多物料模式的民用飞机制造供应链库存管理问题时虽然能够一定程度上提供高效管理,但是带来了较高的复杂度。为解决这一问题,将问题形式化为一个部分可观察马尔可夫决策过程模型,并提出了一种基于改进Transformer的多智能体供应链库存管理方法。该方法基于多智能体强化学习序列决策的本质,将多智能体强化学习问题转化为编码器-解码器架构的序列建模问题,从逻辑上降低算法的复杂度。实验结果表明,相较于现有的基于强化学习的方法,所提方法在保持性能相近的基础上,于复杂度方面约有90%的改善。

关键词: 多智能体强化学习, 飞机供应链库存管理, 部分可观察马尔可夫决策过程, Transformer

Abstract: Effective supply chain inventory management is crucial for large-scale manufacturing industries such as civil aircraft and automotive manufacturing,as it ensures efficient production operations.Typically,the main-manufacturer formulates an annualinventory management plan and contacts suppliers when certain materials approach critical inventory levels based on the actual production schedule.However,changes in actual production conditions may necessitate alterations to the annual inventory ma-nagement plan.Therefore,making procurement decisions based on actual production conditions and inventory is relatively moreflexible and efficient.In recent years,many researchers have focused on using reinforcement learning methods to study inventory management problems.Current methods can achieve a certain degree of efficient management when solving the inventory management problem in the civil aircraft manufacturing supply chain with a multi-node and multi-material model,but with high complexity.To address this issue,we formalize the problem as a partially observable Markov decision process model and propose a multi-agent supply chain inventory management method based on improved transformer.This method transforms the multi-agent reinforcement learning problem into a sequence modeling problem with an encoder-decoder architecture based on the essence of multi-agent reinforcement learning sequence decision-making,logically reducing the complexity of the algorithm.Experimental results show that compared to existing reinforcement learning-based methods,the proposed method has about 90% improvement in complexity while maintaining similar performance.

Key words: Multi-agent reinforcement learning, Aircraft supply chain inventory management, Partially observable Markov decision process, Transformer

中图分类号: 

  • TP399
[1]LU H B,XUE C X.Research on the Construction Method of Application Mechanism for Supply Chain Inventory Management from the ERP Perspective [J].China Storage & Transport,2023(5):195-196.
[2]FENG B Q,CAO L T,WANG S Y.Research on supply management of front stores based on storage theory [J].Manufacturing Automation,2021,43(10):127-130.
[3]LEE H L,BILLINGTON C.Material management in decentrali-zed supply chains [J].Operations Research,1993,41(5):835-47.
[4]NAHMIAS S,SMITH S A.Optimizing inventory levels in atwo-echelon retailer system with partial lost sales [J].Management Science,1994,40(5):582-596.
[5]CARO F,GALLIEN J.Inventory management of a fast-fashion retail network [J].Operations Research,2010,58(2):257-273.
[6]COELHO L C,LAPORTE G.Optimal joint replenishment,delivery and inventory management policies for perishable pr-oducts [J].Computers & Operations Research,2014,47:42-52.
[7]SMITH S A,AGRAWAL N.Management of multi-item retail inventory systems with demand substitution [J].Operations Research,2000,48(1):50-64.
[8]ÅSTRÖM K J,WITTENMARK B.Adaptive control [M].Courier Corporation,2013.
[9]CAMACHO E F,ALBA C B.Model predictive control [M].Springer Science & Business Media,2013.
[10]SCHAAL S.Is imitation learning the route to humanoid robots? [J].Trends in cognitive sciences,1999,3(6):233-242.
[11]GARCIA C,IBEAS A,HERRERA J,et al.Inventory control for the supply chain:An adaptive control approach based on the identification of the lead-time [J].Omega,2012,40(3):314-327.
[12]BRAUN M W,RIVERA D E,FLORES M,et al.A model predictive control framework for robust management of multi-pro-duct,multi-echelon demand networks [J].Annual Reviews in Control,2003,27(2):229-245.
[13]BANIWAL V,KAYAL C,SHAH D,etal.An imitation learning approach for computing anticipatory picking decisions in retail distribution centres[C]//Proceedings of the 2019 American Control Conference(ACC).IEEE,2019.
[14]GIANNOCCARO I,PONTRANDOLFO P.Inventory manage-ment in supply chains:a reinforcement learning approach [J].International Journal of Production Economics,2002,78(2):153-161.
[15]KARA A,DOGAN I.Reinforcement learning approaches forspecifying ordering policies of perishable inventory systems [J].Expert Systems with Applications,2018,91:150-158.
[16]NURKASANAH I.Reinforcement learning approach for effi-cient inventory policy in multi-echelon supply chain under various assumptions and constraints [J].Journal of Information Systems Engineering and Business Intelligence,2021,7(2):138-148.
[17]JIANG C,SHENG Z.Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system [J].Expert Systems with Applications,2009,36(3):6520-6526.
[18]BARAT S,KHADILKAR H,MEISHERI H,et al.Actor based simulation for closed loop control of supply chain using reinforcement learning[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems.2019.
[19]ALVES J C,SILVA D M D,MATEUS G R.Applying and comparing policy gradient methods to multi-echelon supply chains with uncertain demands and lead times [C]//Proceedings of the International Conference on Artificial Intelligence and Soft Computing.Springer,2020.
[20]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning [J].arXiv:1509.02971,2015.
[21]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the International conference on machine learning.PMLR,2018.
[22]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms [J].arXiv:1707.06347,2017.
[23]SULTANA N N,MEISHERI H,BANIWAL V,et al.Rein-forcement learning for multi-product multi-node inventory management in supply chains [J].arXiv:2006.04037,2020.
[24]ALVES J C,MATEUS G R.Deep reinforcement learning and optimization approach for multi-echelon supply chain with uncertain demands[C]//Proceedings of the International Conference on Computational Logistics.Springer,2020.
[25]WANG H,TAO J,PENG T,et al.Dynamic inventory replenishment strategy for aerospace manufacturing supply chain:combining reinforcement learning and multi-agent simulation[J].International Journal of Production Research,2022,60(13):4117-4136.
[26]PIAO M,ZHANG D,LU H,et al.A Supply Chain Inventory Management Method for Civil Aircraft Manufacturing Based on Multi-Agent Reinforcement Learning [J].Applied Sciences,2023,13(13):7510.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!