计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200121-8.doi: 10.11896/jsjkx.250200121
彭俊龙1, 范静2
PENG Junlong1, FAN Jing2
摘要: 旅行售货商问题(Traveling Salesman Problem,TSP)是运筹学中经典的组合优化问题,属于NP难问题。问题的目标是求解旅行商的环游路径,使其在经过每个城市一次后返回起点且路径长度最短。为求解此问题,提出基于指针网络的深度强化学习算法(2+HRL),融合了2-opt算法和图注意力模型。使用图注意力网络提取城市节点的局部和全局结构信息,运用双向LSTM进行路径信息提取,期间利用2-opt策略,通过局部交换改进路径;进而使用REINFORCE算法进行策略网络的梯度优化,结合熵奖励函数避免陷入局部最优解,使用值函数对评价网络参数进行改进。实验结果证明,2+HRL优于传统启发式算法和精确算法,而且与一些深度强化学习算法相比较时,在较低的训练次数下,2+HRL具有更快的计算速度,更准确的计算精度;在增加训练次数后,模型的优化效果也超越了相比较的其他深度强化学习算法。
中图分类号:
| [1]APPLEGATE D L,BIXBY R E,CHVÁTAL V,et al.Certification of an optimal TSP tour through 85 900 cities[J].Operations Research Letters,2009,37(1):11-15. [2]COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactionson Information Theory,1967,13(1):21-27. [3]HELSGAUN K.An extension of the Lin-Kernighan-HelsgaunTSP solver for constrained traveling salesman and vehicle routing problems[J].Roskilde:Roskilde University,2017,12:966-980. [4]HELSGAUN K.General k-opt submoves for the Lin-KernighanTSP heuristic[J].Mathematical Programming Computation,2009,1:119-163. [5]KIRKPATRICK S,GELATT JR C D,VECCHIM P.Optimization by simulated annealing[J].Science,1983,220(4598):671-680. [6]HOPFIELD J J,TANK D W.“Neural” computation of decisionsin optimization problems[J].Biological Cybernetics,1985,52(3):141-152. [7]VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[C]//Advances in Neural Information Processing Systems.2015. [8]NOWAK A,VILLAR S,BANDEIRAA S,et al.A note onlearning algorithms for quadratic assignment with graph neural networks[J].Stat,2017,1050:22. [9]BELLO I,PHAM H,LE Q V,et al.Neural combinatorial optimization with reinforcement learning[C]//International Confe-rence on Learning Representations.2017:1-13. [10]KHALIL E,DAI H,ZHANG Y,et al.Learning combinatorialoptimization algorithms over graphs[C]//Advances in Neural Information Processing Systems.2017. [11]KOOL W,VAN HOOF H,WELLING M.Attention,learn tosolve routing problems[C]//International Conference on Learning Representations.2018. [12]JOSHI C K,LAURENT T,BRESSON X.An efficient graphconvolutional network technique for the travelling salesman problem[J].arXiv:1906.01227,2019. [13]WU Y,SONG W,CAO Z,et al.Learning improvement heuristics for solving routing problems[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(9):5057-5069. [14]DEUDON M,COURNUT P,LACOSTE A,et al.Learning heuristics for the tsp by policy gradient[C]//15th International Conference Integration of Constraint Programming,Artificial Intelligence,and Operations Research(CPAIOR 2018).Springer,2018:170-181. [15]DA COSTA P R,RHUGGENAATH J,ZHANG Y,et al.Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning.PMLR,2020:465-480. [16]SUI J,DING S,LIU R,et al.Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning[C]//Asian Conference on Machine Learning.PMLR,2021:1301-1316. [17]UDDIN F,RIAZ N,MANAN A,et al.An improvement to the 2-opt heuristic algorithm for approximation of optimal TSP tour[J].Applied Sciences,2023,13(12):7339. [18]WANG Y,CHEN Z,YANG X,et al.Solving the TSP Problem with Deep Reinforcement Learning Combined with Graph Attention Model[J].Journal of Nanjing University(Natural Science Edition),2022,58(3):420-429. [19]PERRON L,FURNON V.Or-tools[EB/OL].https://developers.google.com/optimization/. |
|
||