Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240500054-10.doi: 10.11896/jsjkx.240500054

• Artificial Intelligence • Previous Articles     Next Articles

Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer

PIAO Mingjie1, ZHANG Dongdong1, LU Hu2, LI Rupeng2, GE Xiaoli2   

  1. 1 College of Electronic and Information Engineering,Tongji University,Shanghai 201804,China
    2 Aviation Manufacturing Technology Research Institute,Shanghai Aircraft Manufacturing Co.,Ltd.,Shanghai 201324,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:PIAO Mingjie,born in 1999,postgraduate.His main research interests include reinforcement learning and aircraft manufacturing supply chain inventory management.
    ZHANG Dongdong,born in 1977,Ph.D,professor,Ph.D supervisor.Her main research interests include image processing and deep learning.
  • Supported by:
    National Key R&D Program of China(2021YFB3301901).

Abstract: Effective supply chain inventory management is crucial for large-scale manufacturing industries such as civil aircraft and automotive manufacturing,as it ensures efficient production operations.Typically,the main-manufacturer formulates an annualinventory management plan and contacts suppliers when certain materials approach critical inventory levels based on the actual production schedule.However,changes in actual production conditions may necessitate alterations to the annual inventory ma-nagement plan.Therefore,making procurement decisions based on actual production conditions and inventory is relatively moreflexible and efficient.In recent years,many researchers have focused on using reinforcement learning methods to study inventory management problems.Current methods can achieve a certain degree of efficient management when solving the inventory management problem in the civil aircraft manufacturing supply chain with a multi-node and multi-material model,but with high complexity.To address this issue,we formalize the problem as a partially observable Markov decision process model and propose a multi-agent supply chain inventory management method based on improved transformer.This method transforms the multi-agent reinforcement learning problem into a sequence modeling problem with an encoder-decoder architecture based on the essence of multi-agent reinforcement learning sequence decision-making,logically reducing the complexity of the algorithm.Experimental results show that compared to existing reinforcement learning-based methods,the proposed method has about 90% improvement in complexity while maintaining similar performance.

Key words: Multi-agent reinforcement learning, Aircraft supply chain inventory management, Partially observable Markov decision process, Transformer

CLC Number: 

  • TP399
[1]LU H B,XUE C X.Research on the Construction Method of Application Mechanism for Supply Chain Inventory Management from the ERP Perspective [J].China Storage & Transport,2023(5):195-196.
[2]FENG B Q,CAO L T,WANG S Y.Research on supply management of front stores based on storage theory [J].Manufacturing Automation,2021,43(10):127-130.
[3]LEE H L,BILLINGTON C.Material management in decentrali-zed supply chains [J].Operations Research,1993,41(5):835-47.
[4]NAHMIAS S,SMITH S A.Optimizing inventory levels in atwo-echelon retailer system with partial lost sales [J].Management Science,1994,40(5):582-596.
[5]CARO F,GALLIEN J.Inventory management of a fast-fashion retail network [J].Operations Research,2010,58(2):257-273.
[6]COELHO L C,LAPORTE G.Optimal joint replenishment,delivery and inventory management policies for perishable pr-oducts [J].Computers & Operations Research,2014,47:42-52.
[7]SMITH S A,AGRAWAL N.Management of multi-item retail inventory systems with demand substitution [J].Operations Research,2000,48(1):50-64.
[8]ÅSTRÖM K J,WITTENMARK B.Adaptive control [M].Courier Corporation,2013.
[9]CAMACHO E F,ALBA C B.Model predictive control [M].Springer Science & Business Media,2013.
[10]SCHAAL S.Is imitation learning the route to humanoid robots? [J].Trends in cognitive sciences,1999,3(6):233-242.
[11]GARCIA C,IBEAS A,HERRERA J,et al.Inventory control for the supply chain:An adaptive control approach based on the identification of the lead-time [J].Omega,2012,40(3):314-327.
[12]BRAUN M W,RIVERA D E,FLORES M,et al.A model predictive control framework for robust management of multi-pro-duct,multi-echelon demand networks [J].Annual Reviews in Control,2003,27(2):229-245.
[13]BANIWAL V,KAYAL C,SHAH D,etal.An imitation learning approach for computing anticipatory picking decisions in retail distribution centres[C]//Proceedings of the 2019 American Control Conference(ACC).IEEE,2019.
[14]GIANNOCCARO I,PONTRANDOLFO P.Inventory manage-ment in supply chains:a reinforcement learning approach [J].International Journal of Production Economics,2002,78(2):153-161.
[15]KARA A,DOGAN I.Reinforcement learning approaches forspecifying ordering policies of perishable inventory systems [J].Expert Systems with Applications,2018,91:150-158.
[16]NURKASANAH I.Reinforcement learning approach for effi-cient inventory policy in multi-echelon supply chain under various assumptions and constraints [J].Journal of Information Systems Engineering and Business Intelligence,2021,7(2):138-148.
[17]JIANG C,SHENG Z.Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system [J].Expert Systems with Applications,2009,36(3):6520-6526.
[18]BARAT S,KHADILKAR H,MEISHERI H,et al.Actor based simulation for closed loop control of supply chain using reinforcement learning[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems.2019.
[19]ALVES J C,SILVA D M D,MATEUS G R.Applying and comparing policy gradient methods to multi-echelon supply chains with uncertain demands and lead times [C]//Proceedings of the International Conference on Artificial Intelligence and Soft Computing.Springer,2020.
[20]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning [J].arXiv:1509.02971,2015.
[21]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the International conference on machine learning.PMLR,2018.
[22]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms [J].arXiv:1707.06347,2017.
[23]SULTANA N N,MEISHERI H,BANIWAL V,et al.Rein-forcement learning for multi-product multi-node inventory management in supply chains [J].arXiv:2006.04037,2020.
[24]ALVES J C,MATEUS G R.Deep reinforcement learning and optimization approach for multi-echelon supply chain with uncertain demands[C]//Proceedings of the International Conference on Computational Logistics.Springer,2020.
[25]WANG H,TAO J,PENG T,et al.Dynamic inventory replenishment strategy for aerospace manufacturing supply chain:combining reinforcement learning and multi-agent simulation[J].International Journal of Production Research,2022,60(13):4117-4136.
[26]PIAO M,ZHANG D,LU H,et al.A Supply Chain Inventory Management Method for Civil Aircraft Manufacturing Based on Multi-Agent Reinforcement Learning [J].Applied Sciences,2023,13(13):7510.
[1] LONG Xiao, HUANG Wei, HU Kai. Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification [J]. Computer Science, 2025, 52(6A): 240700183-6.
[2] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[3] LI Yang, LIU Yi, LI Hao, ZHANG Gang, XU Mingfeng, HAO Chongqing. Human Pose Estimation Using Millimeter Wave Radar Based on Transformer and PointNet++ [J]. Computer Science, 2025, 52(6A): 240400169-9.
[4] WANG Xuejian, WANG Yiheng, SUN Xinpo, LIU Chuan, JIA Ming, ZHAO Chao, YANG Chao. Extraction of Crustal Deformation Anomalies Based on Transformer-Isolation Forest [J]. Computer Science, 2025, 52(6A): 240600155-6.
[5] CHEN Jiajun, LIU Bo, LIN Weiwei, ZHENG Jianwen, XIE Jiachen. Survey of Transformer-based Time Series Forecasting Methods [J]. Computer Science, 2025, 52(6): 96-105.
[6] WANG Teng, XIAN Yunting, XU Hao, XIE Songqi, ZOU Quanyi. Ship License Plate Recognition Network Based on Pyramid Transformer in Transformer [J]. Computer Science, 2025, 52(6): 179-186.
[7] CUI Kebin, HU Zhenzhen. Few-shot Insulator Defect Detection Based on Local and Global Feature Representation [J]. Computer Science, 2025, 52(6): 286-296.
[8] HAN Daojun, LI Yunsong, ZHANG Juntao, WANG Zemin. Knowledge Graph Completion Method Fusing Entity Descriptions and Topological Structure [J]. Computer Science, 2025, 52(5): 260-269.
[9] JIANG Yiheng, LI Yang, LIU Chunyan , ZHAO Yunlong. Multi-view Multi-person 3D Human Pose Estimation Based on Center-point Attention [J]. Computer Science, 2025, 52(3): 68-76.
[10] LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209.
[11] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[12] LI Zhi, LIN Sen, ZHANG Qiang. Edge Cloud Computing Approach for Intelligent Fault Detection in Rail Transit [J]. Computer Science, 2024, 51(9): 331-337.
[13] WEI Xiangxiang, MENG Zhaohui. Hohai Graphic Protein Data Bank and Prediction Model [J]. Computer Science, 2024, 51(8): 117-123.
[14] XU Bei, LIU Tong. Semi-supervised Emotional Music Generation Method Based on Improved Gaussian Mixture Variational Autoencoders [J]. Computer Science, 2024, 51(8): 281-296.
[15] LEI Yongsheng, DING Meng, SHEN Yao, LI Juhao, ZHAO Dongyue, CHEN Fushi. Action Recognition Model Based on Improved Two Stream Vision Transformer [J]. Computer Science, 2024, 51(7): 229-235.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!