基于后状态强化学习的最优订单接受决策

doi:10.11896/jsjkx.210800261

Abstract

Abstract: As the diversification of customer demand increases,the make-to-order(MTO) model,i.e.,adapting production scheme according to customers’ orders,has attracted increasingly more attention from industry.How to determine whether to accept incoming orders according to the limited production capacity and order status of the enterprise,which is crucial for the enterprise to improve profits.On the basis of the traditional order acceptance problems,this paper proposes a more complete model.Besides the traditional model elements(including delayed delivery cost,rejection cost,and production cost),we further consider the order inventory cost,customer priority and others.Moreover,we model the optimal order acceptance problem as a Markov decision process(MDP).In addition,because the classic MDP method relies on solving and estimating high-dimensional state value function,its computation complexity is high.Therefore,in order to reduce the complexity,this paper proves that the optimal strategy based on the state value function in the classical MDP problem can be defined and constructed by the value function based on the after-state equivalent,thus transforming the multi-dimensional control problem into a one-dimensional control problem.At the same time,in order to solve the continuous state space,this paper combines neural network to parameterize the after-state value function,and solves the problem of large state space.Finally,simulation experiments verify the applicability and superiority of the proposed order acceptance strategy model and algorithm.

Key words: Order acceptance, Reinforcement learning, Markov decision process, Neural network, After-state

CLC Number:

TP399

QIAN Jing, WU Ke-yu, CHEN Chao, HU Xing-chen. Optimal Order Acceptance Decision Based on After-state Reinforcement Learning[J].Computer Science, 2022, 49(11A): 210800261-9.

References

[1]MILLER B L.A Queueing Reward System with Several Custo-mer Classes[J].Management Science,1969,16(3):234-245.
[2]ABEDI A,ZHU W H.An advanced order acceptance model for hybrid production strategy[J].Journal of Manufacturing System,2020,55:82-93.
[3]ZHANGX,MA S H.Order acceptance with limited capacity and finite output buffers in MTO environment[J].Industrial Engineering and Management,2008,13(2):34-38.
[4]GAO H L,DAN B,YAN J.Integrated order selection andscheduling decisions in the MTO environment considering the timeseries associations[J].Journal of Management Engineering,2017,31(3):108-116.
[5]FAN L F,CHEN X.Order Acceptance Policy based on EMSRMethod[J].Management Review,2010,22(4):109-113.
[6]WANG Z,QI Y Q,CUI H R,et al.A hybrid algorithmfor order acceptance and scheduling problem in make-to-stock/make-to-order industries[J].Computers & Industrial Engineering,2019,127:841-852.
[7]TARIK A,KOBE G,KUNAL K,et al.Production planning with order acceptance and demand uncertainty[J].Computers and Operations Rsearch,2018,91:145-159.
[8]FAN L F,CHEN X.Order pricing and acceptance policy inmake-to-order firm based on revenue management[J].System Engineer,2011,29(2):87-93.
[9]LI X,VENTURA J A.Exact algorithms for a joint order acce-ptance and scheduling problem[J].International Journal of Production Economics,2020,223:107516.
[10]ROM W O,SLOTNICK S A.Order acceptance using genetic algorithms[J].Computers & Operations Research,2008,36(6):1758-1767.
[11]NOBIBON F T,LEUS R.Exact algorithms for a generalizationof the order acceptance and scheduling problem in a single-machine environment[J].Computers & Operations Research,2010,38(1):367-378.
[12]CESARET B,OGUZ C,SALMAN F S.A tabu search algorithmfor order acceptance and scheduling[J].Computers and Operations Research,2010,39(6):1197-1205.
[13]WANG L,XU Z Y,ZHAO Y,et al.Model and algorit-hm for order acceptance on multi-node production environment with limited buffer[J].Chinese Journal of Management Science,2015,23(12):135-141.
[14]RAHMAN H F,JANARDHANAN M N,NIELSEN L E.Real-time order acceptance and scheduling problems in a flow shop environment using hybrid GA-PSO algorithm[J].IEEE Access,2019,7:112742-112755.
[15]lLI X P,WANG J,SAWHNEY R.Reinforcement learning forjoint pricing,lead-time and scheduling decisions in make-to-or-der systems[J].European Journal of Operational Research,2012,221(1):99-109.
[16]ARREDONDO F,MARTINEZ E.Learning and adaptation of a policy for dynamic order acceptance in make-to-order manufacturing[J].Computers and Industrial Engineering,2009,58(1):70-83.
[17]HAO J,YU J J,ZHOU W H.Order acceptance policy in make-to-order manufacturing based on average-reward reinforcement learning[J].Journal of Computer Applications,2013,33(4):976-979.
[18]WANG X H,WANG N N,FAN Z P.Reinforcement learning based order acceptance policy in make-to-order enterprises[J].System Engineering-Theory & Practice,2014,34(12):3121-3129.
[19]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].Cambridge:Cambridge University,2011.
[20]LEWICKI G,MARINO G.Approximation by superpositions of a sigmoidal function[J].Journal for Analysis and Its Applications,2003,22(2):463-470.
[21]MITCHELL T.Machine Learning[M].New York:McGraw-Hill,1997.
[22]RIEDMILLER M.Neural fitted Q iteration-first experienceswith a data efficient neural reinforcement learning method[C]//Machine Learning:European Conference on Machine Learning (ECML) 2005.Porto:Portugal,2005:317-328.
[23]HERBOTS J,HERROELEN W,LEUS R.Dynamic order ac-ceptance and capacity planning on a single bottleneck resource[J].Naval Research Logistics,2007,54(8):874-889.
[24]HING M M,HARTEN A V,SCHUUR P.Reinforcement lear-ning versus heuristics for order acceptance on a single resource[J].Journal of Heuristics,2007,13(2):167-187.
[25]CHARNSIRISAKSKUL K,GRIFFIN P M,KESKINOCAK P.Order selection and scheduling with leadtime flexibility[J].IIE Transactions,2004,36(7):697-707.

Related Articles 15

[1]	ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[3]	LIU Xing-guang, ZHOU Li, LIU Yan, ZHANG Xiao-ying, TAN Xiang, WEI Ji-bo. Construction and Distribution Method of REM Based on Edge Intelligence [J]. Computer Science, 2022, 49(9): 236-241.
[4]	NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[5]	WANG Run-an, ZOU Zhao-nian. Query Performance Prediction Based on Physical Operation-level Models [J]. Computer Science, 2022, 49(8): 49-55.
[6]	CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[7]	ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[8]	YUAN Wei-lin, LUO Jun-ren, LU Li-na, CHEN Jia-xing, ZHANG Wan-peng, CHEN Jing. Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning [J]. Computer Science, 2022, 49(8): 191-204.
[9]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[10]	SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[11]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[12]	DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[13]	LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[14]	XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[15]	PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Optimal Order Acceptance Decision Based on After-state Reinforcement Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0