Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 240100211-11.doi: 10.11896/jsjkx.240100211
• Intelligent Computing • Previous Articles Next Articles
AN Yang1,2,3, WANG Xiuqing1,2,3, ZHAO Minghua1
CLC Number:
[1]PATLE B K,GANESH B L,PANDEYA,et al.A review:On path planning strategies for navigation of mobile robot[J].Defence Technology,2019,15(4):582-606. [2]DIJKSTRA E W.A note on two problems in connexion withgraphs[J].Numerische Mathematik,1959,1(1):269-271. [3]HART P E,NILSSON N J,RAPHAELB.A formal basis for theheuristic determination of minimum cost paths[J].IEEE Tran-sactions on Systems Science & Cybernetics,1972,4(2):28-29. [4]STENTZ A.Optimal and efficient path planning for partially-known environments[C]//Proceedings of the 1994 IEEE International Conference on Robotics and Automation.1994:3310-3317. [5]LAVALLE S.Rapidly-exploring random trees:A new tool for path planning:#9811[R].Ames:Iowa State University,1988. [6]FOX D,BURGARD W,THRUN S.The dynamic window ap-proach to collision avoidance[J].IEEE Robotics & Automation Magazine,1997,4(1):23-33. [7]RAWLINGS J B.Tutorial overview of model predictive control[J].IEEE Control Systems Magazine,2000,20(3):38-52. [8]KHATIB O.Real-Time obstacle avoidance system for manipulators and mobile robots[J].The International Journal of Robotics Research,1986,5(1):90-98. [9]SUTTON R S,BARTO A G.Reinforcement learning:an intro-duction[M].London:MIT Press,2017. [10]RAAJAN J,SRIHARI P V,SATYA J P,et al.Real time path planning of robot using deep reinforcement learning[J].IFAC-Papers OnLine,2020,53(2):15602-15607. [11]CHEN P,PEI J,LU W,et al.A deep reinforcement learningbased method for real-time path planning and dynamic obstacle avoidance[J].Neurocomputing,2022,497:64-75. [12]YANG S,SHAN Z,CAO J,et al.Path planning of UAV base station based on deep reinforcement learning[J].Procedia Computer Science,2022,202:89-104. [13]ZHAO Y P.Deep Reinforcement Learning Based Mobile RobotPath Planning[D].Shijiazhuang:Hebei Normal University,2014. [14]MNIH V,KAVUKCUOGLU K,SILVERD,et al.Playing atariwith deep reinforcement learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1312.5602.pdf. [15]MNIH V,KAVUKCUOGLU K,SILVERD,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [16]BADIA A P,PIOT B,KAPTUROWSKIS,et al.Agent57:Outperforming the atari human benchmark[C]//International Conference on Machine Learning.PMLR,2020:507-517. [17]FAWZI A,BALOG M,HUANG A,et al.Discovering faster matrix multiplication algorithms with reinforcement learning[J].Nature,2022,610(7930):47-53. [18]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[EB/OL].[2024-01-17].https://arxiv.org/pdf/1511.05952.pdf. [19]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2016. [20]LILLICRAP T P,HUNT J J,PRITZELA,et al.Continuouscontrol with deep reinforcement learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1509.02971.pdf. [21]HESSEL M,MODAYIL J,VAN HASSELT H,et al.Rainbow:Combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2018:1. [22]FUJIMOTO S,HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning.PMLR,2018:1587-1596. [23]HAARNOJA T,ZHOU A,HARTIKAINEN K,et al.Soft actor-critic algorithms and applications[J].arXiv:1812.05905,2018. [24]TANG G Z,KUMAR N,MICHMIZOS K P.Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware[C]//RSJ International Conference on Intelligent Robots and Systems.2020:9340948. [25]JIANG H,ESFAHANI M A,WU K,et al.iTD3-CLN:Learn to navigate in dynamic scene through Deep Reinforcement Learning[J].Neurocomputing,2022,503:118-128. [26]ALSHIEKH M,BLOEM R,EHLERS R,et al.Safe reinforcement learning via shielding[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI Press,2018:1. [27]CARR S,JANSEN N,JUNGESS,et al.Safe reinforcement learning via shielding under partial observability[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.AAAI Press,2023:14748-14756. [28]PENG Y,TAN G,SI H,et al.DRL-GAT-SA:Deep reinforcement learning for autonomous driving planning based on graph attention networks and simplex architecture[J].Journal of Systems Architecture,2022,126:102505. [29]HO J,ERMON S.Generative adversarial imitation learning[C]//Advances in Neural Information Processing Systems.NeurIPS,2016:29. [30]VECERIK M,HESTER T,SCHOLZ J,et al.Leveraging dem-onstrations for deep reinforcement learning on robotics problems with sparse rewards[EB/OL].[2024-01-17].https://arxiv.org/pdf/1707.08817.pdf. [31]ZHU Z,LIN K,DAI B,et al.Off-policy imitation learning from observations[C]//Advances in Neural Information Processing Systems.NeurIPS,2020:12402-12413. [32]KOSTRIKOV I,AGRAWAL K K,DWIBEDID,et al.Discriminator-actor-critic:Addressing sample inefficiency and reward bias in adversarial imitation learning[EB/OL].[2024-01-17].https://arxiv.org/pdf/1809.02925.pdf. [33]WU J,HUANG Z,HU Z,et al.Toward human-in-the-loop AI:Enhancing deep reinforcement learning via real-time human guidance for autonomous driving[J].Engineering,2023,21:75-91. [34]FANG W,YU Z,CHEN Y,et al.Incorporating learnable mem-brane constant to enhance learning of spiking neural networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2021:2662-2671. [35]DOMINGUEZ-MORALES J P,LIU Q,JAMESR,et al.Deepspiking neural network model for time-variant signals classification:A real-time speech recognition approach[C]//2018 International Joint Conference on Neural Networks(IJCNN).IEEE,2018:1-8. [36]MAHADEVUNI A,LI P.Navigation mobile robots to target in near shortest time using reinforcement learning with spiking neural networks[C]//2017 International Joint Conference on Neural Networks(IJCNN).IEEE,2017:2243-2250. [37]GERSTNER W,KISTLER W M.Spiking neuron models:Single neurons,populations,plasticity[M].Cambridge,England:Cambridge University Press.2002. [38]MORRISON A,DIESMANN M,GERSTNER W.Phenomeno-logical models of synaptic plasticity based on spike timing[J].Biological Cybernetics,2008,98:459-478. [39]RUECKAUER B,LUNGU I A,HU Y,et al.Conversion of continuous-valued deep networks to efficient event-driven networks for image classification[J].Frontiers in Neuroscience,2017,11:682. [40]WU Y,DENG L,LI G,et al.Spatio-temporal backpropagationfor training high-performance spiking neural networks[J].Frontiers in Neuroscience,2018,12:331. [41]Modeling,motion analysis and path planning for wheeled andunderwater robots[M].Beijing:Science Press,2018. [42]LECUN Y,BOTTOU L,BENGIOY,et al.Gradient-based lear-ning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [43]JU H,JUAN R,GOMEZR,et al.Transferring policy of deep reinforcement learning from simulation to reality for robotics[J].Nature Machine Intelligence,2022,4(12):1077-1087. [44]ZHU Z,LIN K,JAINA K,et al.Transfer learning in deep reinforcement learning:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(11):13344-13362. |
[1] | WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272. |
[2] | ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330. |
[3] | TIAN Qing, LU Zhanghu, YANG Hong. Unsupervised Domain Adaptation Based on Entropy Filtering and Class Centroid Optimization [J]. Computer Science, 2024, 51(7): 345-353. |
[4] | LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8. |
[5] | GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9. |
[6] | CAO Yan, ZHU Zhenfeng. DRSTN:Deep Residual Soft Thresholding Network [J]. Computer Science, 2024, 51(6A): 230400112-7. |
[7] | SUN Yang, DING Jianwei, ZHANG Qi, WEI Huiwen, TIAN Bowen. Study on Super-resolution Image Reconstruction Using Residual Feature Aggregation NetworkBased on Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230600039-6. |
[8] | ZHANG Xinrui, YANG Jian, WANG Zhan. Thai Speech Synthesis Based on Cross-language Transfer Learning and Joint Training [J]. Computer Science, 2024, 51(6A): 230500174-7. |
[9] | WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7. |
[10] | LIU Hui, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun. Malicious Attack Detection in Recommendation Systems Combining Graph Convolutional Neural Networks and Ensemble Methods [J]. Computer Science, 2024, 51(6A): 230700003-9. |
[11] | WANG Jiahao, FU Yifu, FENG Hainan, REN Yuheng. Indoor Location Algorithm in Dynamic Environment Based on Transfer Learning [J]. Computer Science, 2024, 51(5): 277-283. |
[12] | YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305. |
[13] | LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258. |
[14] | SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277. |
[15] | ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310. |
|