计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200121-8.doi: 10.11896/jsjkx.250200121

• 人工智能 • 上一篇    下一篇

利用融合2-opt的强化学习算法求解TSP问题

彭俊龙1, 范静2   

  1. 1 上海第二工业大学计算机与信息工程学院 上海 201209
    2 上海第二工业大学数理与统计学院 上海 201209
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 范静(fanjing@sspu.edu.cn)
  • 作者简介:pjl201209@163.com

Hybrid Reinforcement Learning Algorithm Combined with 2-opt for Solving Traveling Salesman Problem

PENG Junlong1, FAN Jing2   

  1. 1 College of Computer and Information Engineering,Shanghai Polytechnic University,Shanghai 201209,China
    2 School of Mathematics,Physic and Statistics,Shanghai Polytechnic University,Shanghai 201209,China
  • Online:2025-11-15 Published:2025-11-10

摘要: 旅行售货商问题(Traveling Salesman Problem,TSP)是运筹学中经典的组合优化问题,属于NP难问题。问题的目标是求解旅行商的环游路径,使其在经过每个城市一次后返回起点且路径长度最短。为求解此问题,提出基于指针网络的深度强化学习算法(2+HRL),融合了2-opt算法和图注意力模型。使用图注意力网络提取城市节点的局部和全局结构信息,运用双向LSTM进行路径信息提取,期间利用2-opt策略,通过局部交换改进路径;进而使用REINFORCE算法进行策略网络的梯度优化,结合熵奖励函数避免陷入局部最优解,使用值函数对评价网络参数进行改进。实验结果证明,2+HRL优于传统启发式算法和精确算法,而且与一些深度强化学习算法相比较时,在较低的训练次数下,2+HRL具有更快的计算速度,更准确的计算精度;在增加训练次数后,模型的优化效果也超越了相比较的其他深度强化学习算法。

关键词: 图注意力网络, 旅行售货商问题, 深度强化学习, 2-opt, 组合最优化

Abstract: TSP is a classic NP-hard combinatorial optimization problem in operations research,aimed at the shortest possible cycle that visits each city exactly once from the origin point and returns back.For solving TSP,this paper presents a hybrid deep reinforcement learning approach(2+HRL) based on a pointer network,integrating graph attention mechanisms and the 2-opt heuristic.Specifically,graph attention networks(GATs) capture both local and global structural information of cities,while bidirectional LSTMs dynamically encode path dependencies for context-aware state representation.During training stage,the 2-opt strategy iteratively improves paths by local edge swaps to enhance solution quality.The policy gradient optimization via the REINFORCE algorithm is combined with an entropy reward function to avoid local optima,while a value network enhances parameter estimation accuracy.Experimental results show that 2+HRL algorithm performs better than traditional heuristics and exact algorithms,and it has faster computational speed and higher precision if limited with fewer training iterations,and its performance exceeds other deep reinforcement learning approaches as training progresses increment.

Key words: Graph attention network, Traveling salesman problem, Deep reinforcement learning, 2-opt, Combinatorial optimization

中图分类号: 

  • TP181
[1]APPLEGATE D L,BIXBY R E,CHVÁTAL V,et al.Certification of an optimal TSP tour through 85 900 cities[J].Operations Research Letters,2009,37(1):11-15.
[2]COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactionson Information Theory,1967,13(1):21-27.
[3]HELSGAUN K.An extension of the Lin-Kernighan-HelsgaunTSP solver for constrained traveling salesman and vehicle routing problems[J].Roskilde:Roskilde University,2017,12:966-980.
[4]HELSGAUN K.General k-opt submoves for the Lin-KernighanTSP heuristic[J].Mathematical Programming Computation,2009,1:119-163.
[5]KIRKPATRICK S,GELATT JR C D,VECCHIM P.Optimization by simulated annealing[J].Science,1983,220(4598):671-680.
[6]HOPFIELD J J,TANK D W.“Neural” computation of decisionsin optimization problems[J].Biological Cybernetics,1985,52(3):141-152.
[7]VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems.2015.
[8]NOWAK A,VILLAR S,BANDEIRAA S,et al.A note onlearning algorithms for quadratic assignment with graph neural networks[J].Stat,2017,1050:22.
[9]BELLO I,PHAM H,LE Q V,et al.Neural combinatorial optimization with reinforcement learning[C]//International Confe-rence on Learning Representations.2017:1-13.
[10]KHALIL E,DAI H,ZHANG Y,et al.Learning combinatorialoptimization algorithms over graphs[C]//Advances in Neural Information Processing Systems.2017.
[11]KOOL W,VAN HOOF H,WELLING M.Attention,learn tosolve routing problems[C]//International Conference on Learning Representations.2018.
[12]JOSHI C K,LAURENT T,BRESSON X.An efficient graphconvolutional network technique for the travelling salesman problem[J].arXiv:1906.01227,2019.
[13]WU Y,SONG W,CAO Z,et al.Learning improvement heuristics for solving routing problems[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(9):5057-5069.
[14]DEUDON M,COURNUT P,LACOSTE A,et al.Learning heuristics for the tsp by policy gradient[C]//15th International Conference Integration of Constraint Programming,Artificial Intelligence,and Operations Research(CPAIOR 2018).Springer,2018:170-181.
[15]DA COSTA P R,RHUGGENAATH J,ZHANG Y,et al.Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning.PMLR,2020:465-480.
[16]SUI J,DING S,LIU R,et al.Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning.PMLR,2021:1301-1316.
[17]UDDIN F,RIAZ N,MANAN A,et al.An improvement to the 2-opt heuristic algorithm for approximation of optimal TSP tour[J].Applied Sciences,2023,13(12):7339.
[18]WANG Y,CHEN Z,YANG X,et al.Solving the TSP Problem with Deep Reinforcement Learning Combined with Graph Attention Model[J].Journal of Nanjing University(Natural Science Edition),2022,58(3):420-429.
[19]PERRON L,FURNON V.Or-tools[EB/OL].https://developers.google.com/optimization/.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!