计算机科学 ›› 2009, Vol. 36 ›› Issue (9): 161-166.

• 人工智能 • 上一篇    下一篇

双马尔可夫决策过程联合模型

王蓁蓁,邢汉承   

  1. (南京大学计算机科学与技术系 南京 210093);(东南大学计算机科学与工程学院 南京 210096)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金(90412014,60803061),江苏省自然科学基金(BK2008293}资助。

Associated Model of Bi-Markov Decision Processes

WANG Zhen-zhen, XING Han-cheng   

  • Online:2018-11-16 Published:2018-11-16

摘要: 人类在处理问题中往往分为两个层次,首先在整体上把握问题,即提出大体方案,然后再具体实施。也就是说人类就是具有多分辫率智能系统的极好例子,他能够在多个层次上从底向上泛化(即看问题角度粒度变“粗”,它类似于抽象),并且又能从顶向下进行实例化(即看问题角度变“细”,它类似于具体化)。由此构造了由在双层(理想空间即泛化和实际空间即实例化)上各自运行的马尔可夫决策过程组成的半马尔可夫决策过程,称之为双马尔可夫决策过程联合模型。然后讨论该联合模型的最优策略算法,最后给出一个实例说明双马尔可夫决策联合模型能够经济地节约

关键词: 马尔可夫决策过程,增强学习,最优策略

Abstract: Human thought is often divided two levels while dealing with problems. First people always treat problems from a whole perspective, i. c.,they have a general plan, then they specifically deal with details. I}he human itself is a good example for having a multi-resolutional characteristic. It can not only generalize bottom-up among multi-levels (the granule of viewpoint abou印roblem becomes "rough" , analogous to abstract) , but also instantiate top-down (the granule of viewpoint becomes "thin",analogous to specification). So we constructed a semi Markov decision process consisting of two Markov decision processes running respectively on two levels-the ideal space (generalization) and the actual space (instantiation). It is called an associated bi-Markov decision model. Then we discussed how to find optimal policy under this associated model. Finally an example was given to show that the associated bi Markov decision process model can economically economize "mind" and is a good tradeoff between the computational validity and computational feasibility.

Key words: Markov decision processes, Rcinforcement learning, Optimal policy

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!