Markov决策过程不确定策略特征模式

计算机科学 ›› 2013, Vol. 40 ›› Issue (4): 263-266.

Markov决策过程不确定策略特征模式

黄镇谨,陆阳,杨娟,方欢

合肥工业大学计算机与信息学院合肥230009;合肥工业大学计算机与信息学院合肥230009;合肥工业大学计算机与信息学院合肥230009;合肥工业大学计算机与信息学院合肥230009

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然科学基金资助

Property Patterns of Markov Decision Process Nondeterministic Choice Scheduler

HUANG Zhen-jin,LU Yang,YANG Juan and FANG Huan

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 马尔科夫决策过程可以建模具有不确定性特征的复杂系统,而在进行模型分析时需要采用策略对不确定性进行处理。首先,研究不同策略下时空有界可达概率问题,给出不确定性解决策略的定义及分类方法。其次,在时间无关策略下,证明基于确定性选取动作和随机选取动作的时空有界可达概率的一致性,并且论证了时间依赖策略相对于时间无关策略具有更好的时空有界可达概率。最后结合实例简要阐述了结论的正确性。

关键词: 马尔科夫决策过程,不确定性策略,时空有界可达概率

Abstract: Markov decision process can model complex system with nondeterminism．Schedulers are required to resolve the nonderministic choices during model analysis．This paper introduced the time-and space-bounded reachability probabilities of markov decision process under different schedulers．Firstly,the formal definition and classification method of schedulers for nonderminism were proposed and then we proved that the reachability probabilities coincide for determini-stic and randomized schedulers under time-abstract．Also,it was proved that time-dependent scheduler generally induces probability bounds that exceed those of the corresponding time-abstract．At the end of paper,two cases were illustrated for describing the correctness of the conclusion.

Key words: Markov decision process,Nonderministic scheduler,Time-and space-bounded reachability probability

黄镇谨,陆阳,杨娟,方欢. Markov决策过程不确定策略特征模式[J]. 计算机科学, 2013, 40(4): 263-266. https://doi.org/

HUANG Zhen-jin,LU Yang,YANG Juan and FANG Huan. Property Patterns of Markov Decision Process Nondeterministic Choice Scheduler[J]. Computer Science, 2013, 40(4): 263-266. https://doi.org/

参考文献

[1] 钮俊,曾国荪,吕新荣．随机模型检测连续时间Markov过程[J]．计算机科学,2011,8(9):112-115
[2] Puterman M L．Markov Decision Processes:Discrete Stochastic Dynamic Programming[M]．NewYork:Wiley,1994
[3] Markus B．Optimal Schedulers for Time-Bounded Reachability in CTMDPs[D]．Homburg:Saarland University,2009
[4] Baier C,Cloth L,Haverkort B,et al．Model checking Markovchains with aciton and state labels[J]．IEEE Transactions on Software Engineering,2007,3(4):209-224
[5] Cloth L．Model checking algorithms for Markov reward models [D]．Enschede,University of Twente,2006
[6] Hermanns H．Interactive Markov Chains[M]．Heidelberg:Sp-ringer,2002
[7] Baier C,Hermanns H,Katoen J-P,et al．Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes[J].Theoretical Computer Scien-ce,2005,345(1):2-26
[8] Brazdil T,Forejt V,Krcal J,et al.Continuous-time stochasticgames with time bounded reachability[A]∥Conference on Foundations of Software Technology and Theoretical Computer Science,2009[C]．2009:61-72
[9] Lijun Z,Neuhuβer M R．Model checking interactive Markovchains[A]∥Proceedings of TACAS,2010[C].2010:53-68
[10] Neuhuβer M-R,Zhang L．Time-bounded reachability probabilities in continuous-time Markov decision processes[A]∥7th International Conference on Quantitative Evaluation of Systems,2010[C]．2010:209-218
[11] Rabe M,Schewe S．Finite optimal control for time-boundedreachability in CTMDPs and continuous-time Markov games[A]∥CoRR,2010[C]．2010:1004-4005
[12] Buchholz P,Hahn E M,Hermanns H,et al．Model Checking Algorithms for CTMDPs[A]∥Proceedings of the 23rd international conference on Computer aided verification,CAV’11,1[C]．Berlin,Heidelberg:Springer-Verlag,2011:225-242
[13] 钮俊,曾国荪,王伟．基于模型检测的时间空间性能验证方法[J]．计算机学报,2010,3(9):1624-1633

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed