Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 250200121-8.doi: 10.11896/jsjkx.250200121

• Artificial Intelligence • Previous Articles     Next Articles

Hybrid Reinforcement Learning Algorithm Combined with 2-opt for Solving Traveling Salesman Problem

PENG Junlong1, FAN Jing2   

  1. 1 College of Computer and Information Engineering,Shanghai Polytechnic University,Shanghai 201209,China
    2 School of Mathematics,Physic and Statistics,Shanghai Polytechnic University,Shanghai 201209,China
  • Online:2025-11-15 Published:2025-11-10

Abstract: TSP is a classic NP-hard combinatorial optimization problem in operations research,aimed at the shortest possible cycle that visits each city exactly once from the origin point and returns back.For solving TSP,this paper presents a hybrid deep reinforcement learning approach(2+HRL) based on a pointer network,integrating graph attention mechanisms and the 2-opt heuristic.Specifically,graph attention networks(GATs) capture both local and global structural information of cities,while bidirectional LSTMs dynamically encode path dependencies for context-aware state representation.During training stage,the 2-opt strategy iteratively improves paths by local edge swaps to enhance solution quality.The policy gradient optimization via the REINFORCE algorithm is combined with an entropy reward function to avoid local optima,while a value network enhances parameter estimation accuracy.Experimental results show that 2+HRL algorithm performs better than traditional heuristics and exact algorithms,and it has faster computational speed and higher precision if limited with fewer training iterations,and its performance exceeds other deep reinforcement learning approaches as training progresses increment.

Key words: Graph attention network, Traveling salesman problem, Deep reinforcement learning, 2-opt, Combinatorial optimization

CLC Number: 

  • TP181
[1]APPLEGATE D L,BIXBY R E,CHVÁTAL V,et al.Certification of an optimal TSP tour through 85 900 cities[J].Operations Research Letters,2009,37(1):11-15.
[2]COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactionson Information Theory,1967,13(1):21-27.
[3]HELSGAUN K.An extension of the Lin-Kernighan-HelsgaunTSP solver for constrained traveling salesman and vehicle routing problems[J].Roskilde:Roskilde University,2017,12:966-980.
[4]HELSGAUN K.General k-opt submoves for the Lin-KernighanTSP heuristic[J].Mathematical Programming Computation,2009,1:119-163.
[5]KIRKPATRICK S,GELATT JR C D,VECCHIM P.Optimization by simulated annealing[J].Science,1983,220(4598):671-680.
[6]HOPFIELD J J,TANK D W.“Neural” computation of decisionsin optimization problems[J].Biological Cybernetics,1985,52(3):141-152.
[7]VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems.2015.
[8]NOWAK A,VILLAR S,BANDEIRAA S,et al.A note onlearning algorithms for quadratic assignment with graph neural networks[J].Stat,2017,1050:22.
[9]BELLO I,PHAM H,LE Q V,et al.Neural combinatorial optimization with reinforcement learning[C]//International Confe-rence on Learning Representations.2017:1-13.
[10]KHALIL E,DAI H,ZHANG Y,et al.Learning combinatorialoptimization algorithms over graphs[C]//Advances in Neural Information Processing Systems.2017.
[11]KOOL W,VAN HOOF H,WELLING M.Attention,learn tosolve routing problems[C]//International Conference on Learning Representations.2018.
[12]JOSHI C K,LAURENT T,BRESSON X.An efficient graphconvolutional network technique for the travelling salesman problem[J].arXiv:1906.01227,2019.
[13]WU Y,SONG W,CAO Z,et al.Learning improvement heuristics for solving routing problems[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(9):5057-5069.
[14]DEUDON M,COURNUT P,LACOSTE A,et al.Learning heuristics for the tsp by policy gradient[C]//15th International Conference Integration of Constraint Programming,Artificial Intelligence,and Operations Research(CPAIOR 2018).Springer,2018:170-181.
[15]DA COSTA P R,RHUGGENAATH J,ZHANG Y,et al.Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning.PMLR,2020:465-480.
[16]SUI J,DING S,LIU R,et al.Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning.PMLR,2021:1301-1316.
[17]UDDIN F,RIAZ N,MANAN A,et al.An improvement to the 2-opt heuristic algorithm for approximation of optimal TSP tour[J].Applied Sciences,2023,13(12):7339.
[18]WANG Y,CHEN Z,YANG X,et al.Solving the TSP Problem with Deep Reinforcement Learning Combined with Graph Attention Model[J].Journal of Nanjing University(Natural Science Edition),2022,58(3):420-429.
[19]PERRON L,FURNON V.Or-tools[EB/OL].https://developers.google.com/optimization/.
[1] ZHU Shihao, PENG Kexing, MA Tinghuai. Graph Attention-based Grouped Multi-agent Reinforcement Learning Method [J]. Computer Science, 2025, 52(9): 330-336.
[2] CHEN Jintao, LIN Bing, LIN Song, CHEN Jing, CHEN Xing. Dynamic Pricing and Energy Scheduling Strategy for Photovoltaic Storage Charging Stations Based on Multi-agent Deep Reinforcement Learning [J]. Computer Science, 2025, 52(9): 337-345.
[3] ZHOU Tao, DU Yongping, XIE Runfeng, HAN Honggui. Vulnerability Detection Method Based on Deep Fusion of Multi-dimensional Features from Heterogeneous Contract Graphs [J]. Computer Science, 2025, 52(9): 368-375.
[4] ZHANG Yongliang, LI Ziwen, XU Jiahao, JIANG Yuchen, CUI Ying. Congestion-aware and Cached Communication for Multi-agent Pathfinding [J]. Computer Science, 2025, 52(8): 317-325.
[5] LI Mengxi, GAO Xindan, LI Xue. Two-way Feature Augmentation Graph Convolution Networks Algorithm [J]. Computer Science, 2025, 52(7): 127-134.
[6] HUO Dan, YU Fuping, SHEN Di, HAN Xueyan. Research on Multi-machine Conflict Resolution Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(7): 271-278.
[7] HUANG Ao, LI Min, ZENG Xiangguang, PAN Yunwei, ZHANG Jiaheng, PENG Bei. Adaptive Hybrid Genetic Algorithm Based on PPO for Solving Traveling Salesman Problem [J]. Computer Science, 2025, 52(6A): 240600096-6.
[8] LI Yingjian, WANG Yongsheng, LIU Xiaojun, REN Yuan. Cloud Platform Load Data Forecasting Method Based on Spatiotemporal Graph AttentionNetwork [J]. Computer Science, 2025, 52(6A): 240700178-8.
[9] ZHAO Yaoshuai, ZHANG Yi. Modeling of Civil Aviation Passenger Individual and Social Preferences and Optimization of Flight Seat Allocation [J]. Computer Science, 2025, 52(6A): 240600038-8.
[10] WU Zongming, CAO Jijun, TANG Qiang. Online Parallel SDN Routing Optimization Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240900018-9.
[11] WANG Chenyuan, ZHANG Yanmei, YUAN Guan. Class Integration Test Order Generation Approach Fused with Deep Reinforcement Learning andGraph Convolutional Neural Network [J]. Computer Science, 2025, 52(6): 58-65.
[12] ZHAO Xuejian, YE Hao, LI Hao, SUN Zhixin. Multi-AGV Path Planning Algorithm Based on Improved DDPG [J]. Computer Science, 2025, 52(6): 306-315.
[13] LI Yuanbo, HU Hongchao, YANG Xiaohan, GUO Wei, LIU Wenyan. Intrusion Tolerance Scheduling Algorithm for Microservice Workflow Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(5): 375-383.
[14] ZHENG Longhai, XIAO Bohuai, YAO Zewei, CHEN Xing, MO Yuchang. Graph Reinforcement Learning Based Multi-edge Cooperative Load Balancing Method [J]. Computer Science, 2025, 52(3): 338-348.
[15] DU Likuan, LIU Chen, WANG Junlu, SONG Baoyan. Self-learning Star Chain Space Adaptive Allocation Method [J]. Computer Science, 2025, 52(3): 359-365.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!