计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 266-273.doi: 10.11896/jsjkx.230300044

• 人工智能 • 上一篇    下一篇

基于群智能体深度强化学习的模块化机器人自重构算法

王翰墨, 郑世杰, 徐若楠, 郭斌, 吴磊   

  1. 西北工业大学计算机学院 西安 710072
  • 收稿日期:2023-03-04 修回日期:2023-04-13 出版日期:2023-06-15 发布日期:2023-06-06
  • 通讯作者: 郭斌(guob@nwpu.edu.cn)
  • 作者简介:(whm2001@mail.nwpu.edu.cn)
  • 基金资助:
    国家杰出青年科学基金(62025205);国家自然科学基金(62032020,62102317)

Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning

WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei   

  1. School of Computer Science,Northwestern Polytechnical University,Xi'an 710072,China
  • Received:2023-03-04 Revised:2023-04-13 Online:2023-06-15 Published:2023-06-06
  • About author:WANG Hanmo,born in 2001,undergraduate,is a member of China Computer Federation. His main research interest is modular robot.GUO Bin,born in 1980,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include ubiquitous computing and crowd intelligence with the deep fusion of human,machine and things.
  • Supported by:
    National Science Fund for Distinguished Young Scholars(62025205) and National Natural Science Foundation of China(62032020,62102317).

摘要: 模块化机器人是由一定数量、具有独立功能的标准模块组合而成的。自重构问题是目前模块化机器人研究领域的热点与难点。传统的图论算法或者搜索算法在模块数量较多、复杂度较大时,无法在多项式时间内寻找到通用最优解。文中从群智能体深度强化学习的角度出发,将每个同构模块视为具有学习与感知能力的单智能体,提出了基于QMIX的模块化机器人自重构算法。针对该算法,设计了一种新型的奖励函数,并在限制智能体的动作空间的基础上,实现了智能体并行化移动,在一定程度上解决了多智能体之间的协调合作问题,从而实现了从初始构型向目标构型的转变。实验以9个模块为例,对比了该算法与基于A*的传统搜索算法在成功率以及平均步数上的差异。实验结果表明,在时间步数限制合理的情况下,基于QMIX的模块化机器人自重构算法的成功率能够达到95%以上,两种算法的平均步数大约在12步左右,QMIX自重构算法能够逼近传统算法的效果。

关键词: 模块化机器人, 自重构, 群智能体协作, 深度强化学习, 构型空间与运动空间

Abstract: Modular robots are composed of a certain number of standard modules with independent functions.At present,self reconfiguration is a hot and difficult problem in the field of modular robot research.For complex problems,the traditional graph theory algorithm or search algorithm cannot find its optimal solution in polynomial time,and the complexity increases exponentially with the increase of the number of modules.From the perspective of deep reinforcement learning of swarm agents,the research regards each isomorphic module as a single agent with learning and perception ability,and proposes a modular robot self reconfiguration algorithm based on QMIX.For this algorithm,a new type of reward function is designed and the parallel movement of the agent on the basis of limiting the action space of the agents is realized,which solves the problem of coordination and cooperation between multiple agents to a certain extent,thereby realizing the transition from the initial configuration to the target configuration.In addition,in experiments,9 modules are taken as examples to compare the success rate and average steps between this algorithm and the traditional search algorithm based on A*.Experimental results show that when the time step limit is reasonable,the success rate of the modular robot self-reconfiguration algorithm based on QMIX can reach more than 95%,and the average number of steps of the two algorithms is about 12 steps.The QMIX self-reconfiguration algorithm can approach the effect of the traditional algorithm.

Key words: Modular robot, Self reconfiguration, Swarm agent collaboration, Deep reinforcement learning, Configuration space and action space

中图分类号: 

  • TP242.6
[1]DAI Y,ZHANG Q H,GAO Y F,et al.Overview of self-reconfigurable modular robot module design[J].Journal of Harbin University of Technology,2021,26(5):34-43.
[2]SUN X,GE W,WANG X,et al.A reconfiguration approach for self-reconfigurable modular robot using assisted modules[C]//IEEE International Conference on Mechatronics & Automation.IEEE,2015:1436-1441.
[3]AHMADZADEH H,MASEHIAN E.A fluid dynamics ap-proach for self-reconfiguration planning of modular robots[C]//RSI International Conference on Robotics & Mechatro-nics.IEEE,2016:139-145.
[4]PARHAMI P,MORADI H,ASADPOUR M,et al.Generatingan efficient hub graph for self-reconfiguration planning in modular robots[C]//Robotics and Mechatronics (ICROM),2015 3rd RSI International Conference on.IEEE,2015:476-481.
[5]LIU Y J,YU M J,YE Z P,et al.Path planning for self-reconfigurable modular robots:a survey[J].Scientia Sinica Informationis,2018,48(2):143-176.
[6]TAREK A,NOUREDDINED,YVES D,et al.Genetic Programming-based Self-reconfiguration Planning for Metamorphic Robot[J].International Journal of Automation and Computing,2018,15(4):57-68.
[7]WALTER J E.Sensor-Driven Algorithm for Self-Reconfigura-tion of Modular Robots[C]//2018 International Conference on Reconfigurable Mechanisms and Robots.2018:1-7.
[8]LIU C,WHITZER M,YIM M.A Distributed Reconfiguration Planning Algorithm for Modular Robots[J].IEEE Robotics and Automation Letters,2019,4(4):4231-4238.
[9]NAZ A,PIRANDA B,GOLDSTEIN S C,et al.A distributed self-reconfiguration algorithm for cylindrical lattice-based modular robots[C]//IEEE International Symposium on Network Computing & Applications.IEEE,2016.
[10]LUO H,LI M,LIANG G,et al.An Obstacle-crossing Strategy Based on the Fast Self-reconfiguration for Modular Sphere Robots[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2020.
[11]GERBL M,GERSTMAYR J.Self-reconfiguration planning ofadaptive modular robots with triangular structure based on extended binary trees[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems.IEEE,2020:3312-3319.
[12]BASSIL J,PIRANDA B,MAKHOUL A,et al.RePoSt:Distri-buted Self-Reconfiguration Algorithm for Modular Robots Based on Porous Structure [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems.2022:12651-12658.
[13]BUCHI B,MABED H,FRÉDÉRIC L,et al.Translation based Self Reconfiguration Algorithm for 6-lattice Modular Robots[C]//International Symposium on Parallel and Distributed Computing.IEEE,2021:49-56.
[14]ZHANG Y Z,WANG W H,HUANG P F,et al A Self Reconstruction Planning Method for Heterogeneous Modular Robots Based on Reinforcement Learning Algorithm:CN110297490A [P] 2019.
[15]WITZ F,BUCHI B,MABED H,et al.Deep Learning for the selection of the best modular robots self-reconfiguration algorithm[C]//2022 IEEE Symposium on Computers and Communications.Rhodes,Greece,2022:1-6.
[16]LI W K,YUE H W,WANG H M,et al.Modular self-reconfigurable robot formation based on improved reinforcement learning[J].Computing Technology and Automation,2022,41(3):6-13.
[17]VOLODYMYR M,KORAY K,DAVID S,et al.Playing Atariwith Deep Reinforcement Learning[J].arXiv:1312.5602,2013.
[18]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning[J].arXiv:1706.05296,2017.
[19]RASHID T,SAMVELYAN M,DE W,et al.Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[J].Journal of Machine Learning Resarch,2020,21(1):7234-7284.
[20]ZHANG Y,WANG Q,KANG Y L,et al.Summary of key technologies and research prospects of modular self-reconfigurable robots[J].Journal of Hebei University of Science and Technology,2022,43(6):602-612.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!