基于双重指针网络的车货匹配双重序列决策研究

doi:10.11896/jsjkx.210800257

摘要/Abstract

摘要： 由于我国对公路运输资源利用不均,车货供需问题成为如今的热点问题。车货供需匹配平台为最大化总体运力资源利用率,需要整合运输需求和运力,降低成本并提高效率。大部分平台通常采用启发式算法求解车货匹配问题,此类算法面对大规模的问题时存在寻优瓶颈。针对上述问题,首次将车货供需匹配问题转变成一种双重序列决策问题,据此研究适用于当今车货供需匹配环节的一种高效算法。首先,提出了一种车货匹配的数学模型,并将该模型抽象为双重序列决策问题,再创新性地提出双重指针网络算法求解该问题。本实验使用Actor-Critic算法作为模型的训练框架来训练双重指针网络,并评估了模型。经实验得,双重指针网络的车货匹配求解方法的寻优能力在小问题规模中与传统启发式算法相当,在大问题规模中超越启发式算法,同时时间消耗都大大下降。

关键词: 双重指针网络, 双重序列决策问题, 深度强化学习, 组合优化, 车货匹配, critic网络

Abstract: Due to the uneven utilization of road transportation resources in my country,the supply and demand of trucks and cargo become a hot issue today.In order to maximize the utilization of overall transportation resources,the truck and cargo supply-demand matching platform needs to integrate transportation demand and capacity,reduce costs and improve efficiency.The algorithms used by most platforms are usually heuristic algorithms to solve the problem of trucks-cargo matching.Such algorithms have a bottleneck in optimizing when faced with large-scale problems.In response to the above-mentioned problems,this paper transforms the supply-demand matching problem of vehicles and goods into a double sequence decision-making problem for the first time.Based on this,we study an efficient algorithm that is suitable for today’s vehicle and goods supply-demand matching links.First,a mathematical model of trucks-cargo matching is proposed and the model is abstracted as a double sequence decision problem,and then a dual-pointer-network algorithm is innovatively proposed to solve this problem.The experiment uses the Actor-Critic algorithm as the model training framework to train the dual-pointer-network and evaluates the model.Experiments show that the dual-pointer-network’s vehicle-to-cargo matching solution method is equivalent to traditional heuristic algorithms in small problem scales,and surpasses heuristic algorithms in large problem scales.At the same time,the time consumption is greatlyreduced.

Key words: Dual pointer network, Double sequence decision-making problem, Deep reinforcement learning, Combinatorial optimization, Trucks and cargo matching, Critic network

中图分类号:

TP311

蔡岳, 王恩良, 孙哲, 孙知信. 基于双重指针网络的车货匹配双重序列决策研究[J]. 计算机科学, 2022, 49(11A): 210800257-9. https://doi.org/10.11896/jsjkx.210800257

CAI Yue, WANG En-liang, SUN Zhe, SUN Zhi-xin. Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network[J]. Computer Science, 2022, 49(11A): 210800257-9. https://doi.org/10.11896/jsjkx.210800257

参考文献

[1]WANG C,NI Y,YANG X.The Production Routing Problem Under Uncertain Environment[J].IEEE Access,2021,PP(99):1-1.
[2]ODILI J B.Combinatorial optimization in science and enginee-ring[J].Current science,2017,113(12):2268-2274.
[3]MARATHE M V,PERCUS A G,TORNEY D C.Combinatorial Optimization in Biology[Z].1999.
[4]CHEN C,LI C.Process Synthesis and Design Problems Based on a Global Particle Swarm Optimization Algorithm[J].IEEE Access,2021,PP(99):1-1.
[5]MELNIKOV,TSYGANOV,BULYCHOV.A Multi-heuristicAlgorithmic Skeleton for Hard Combinatorial Optimization Problems[C]//2nd International Joint Conference on Computational Sciences and Optimization(CSO).2009.
[6]GREBENNIK I,DUPAS R,URNIAIEVA I,et al.Mathematical Model of Containers Placement in Rail Terminal Operations Problem[C]//International Conference on Advanced Computer Information Technologies.Dept.of System Engineering,Kharkiv National University of Radioelectronics,Kharkiv,Ukraine;Laboratory IMS,University of Bordeaux,Bordeaux,France;Dept.of System Engineering,Kharkiv National University of Radioelectronics,Kharkiv,Ukraine;Dept.of,2019.
[7]SAHANA S K,JAIN A.High Performance Ant Colony Optimizer(HPACO) for Travelling Salesman Problem(TSP)[M].Springer International Publishing,2014.
[8]KALAKANTI A K,VERMA S,PAUL T,et al.RL SolVeRPro:Reinforcement Learning for Solving Vehicle Routing Problem[C]//2019 1st International Conference on Artificial Intelligence and Data Sciences(AiDAS).2019.
[9]HIFIM,YOUSSOUF A M,SAADI T,et al.A CooperativeSwarm Optimization-Based Algorithm for the Quadratic Multiple Knapsack Problem[C]//2020 7th International Conference on Control,Decision and Information Technologies(CoDIT).2020.
[10]MUNIEN C,MAHABEER S,DZITIRO E,et al.Metaheuristic Approaches for One-Dimensional Bin Packing Problem:A Comparative Performance Study[J].IEEE Access,2020,PP(99).
[11]CHEN B H.A variable neighborhood search algorithm with reinforcement learning for a real-life periodic vehicle routing problem with time windows and open routes[J].RAIRO-Operations Research,2020,54(5):1467-1494.
[12]HAO X.Optimization Models and Heuristic Method Based on Simulated Annealing Strategy for Traveling Salesman Problem[J].Applied Mechanics and Materials,2010,34-35(4):1180-1184.
[13]OUYANG A J,ZHOU Y Q.An improved PSO-ACO algorithm for solving large-scale TSP[J].Advanced Materials Research,2011,143-144:1154-1158.
[14]BAI D L,GUO Q P.An Optimized Genetic Algorithm for TSP[C]//Proceedings of 2008 International Symposium on Distributed Computing and Applications for Business Engineering and Science.2008.
[15]KAYSTHA S,AGARWAL S.Greedy genetic algorithm toBounded Knapsack Problem[C]//IEEE International Confe-rence on Computer Science & Information Technology.IEEE,2010:301-305.
[16]BELLO I,PHAM H,LE Q V,et al.Neural combinatorial optimization with reinforcement learning[J].arXiv:1611.09940,2016.
[17]GUO J N.Vehicle-Cargo Matching Using a Fuzzy Group Decision-Making Approach[J].Journal of Transportation Engineering and Information,2017,15(4):141-146.
[18]LI H.Research on Supply and Demand Matching of Vehiclesand Cargos Based on Stowage Logistics Information Service Platform[D].Beijing:Beijing Jiaotong University,2015.
[19]XIONG Y Q.LogisticsPublic Information Platform Freight Forwarding Matching and Credibility Motivation Mechanism[D].Beijing:Tsinghua University,2015.
[20]WU G S.Research on Matching Model of Vehicle Cargo Consi-dering Trading Party’s Preference[D].Nanjing:Nanjing University,2017.
[21]HU J L,BING C,HAN S G.Study on vehicles and goods ma-tching of arterial road freight platform based on TS algorithm[J].Journal of Zhejiang Sci-Tech University(Social Science Edition),2018,40(5):478-486.
[22]ZHANG Q J.Research on Combination Matching Model Based on Logistics Supply and Demand Information[D].Xi’an:Xidian University,2017.
[23]MU X W,CHEN Y,GAO S J,et al.Vehicleand Cargo Matching Method Based on Improved Quantum Evolutionary Algorithm[J].Chinese Journal of Management Science,2016,24(12):166-176.
[24]ZHAO C Y.Research on Vehicles and Cargos Matching Problem for Virtual Social Reserve Platform in M area[D].Beijing:Beijing Jiaotong University,2019.
[25]YU Y S,LIU X Y.Research on Vehicles and Cargos Matching Based on Improved Balance Algorithm [J].Journal of Wuhan University of Technology,2016,38(10):47-54.
[26]AFFAN M,JAWAID J,AHMED S U,et al.Solving Combinatorial Problems through Off-Policy Reinforcement Learning Methods[C]//2020 International Conference on Electrical,Communication,and Computer Engineering(ICECCE).2020.
[27]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning.PMLR,2018:1861-1870.
[28]VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[J].arXiv:1506.03134,2015.
[29]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[J].arXiv:1409.3215,2014.
[30]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[31]LEIBA J,KIROS J R,HINTON G E.Layer normalization[J].arXiv:1607.06450,2016.
[32]HE K M,ZHANG X Y,REN S Q,et al..Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[33]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[34]SUN F,JIANG P,SUN H,et al.Multi-source pointer network for product title summarization[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management.2018:7-16.
[35]SUTTON R S,MCALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]//NIPs.1999:1057-1063.
[36]MAZYAVKINA N,SVIRIDOV S,IVANOV S,et al.Reinforcement learning for combinatorial optimization:A survey[J].arXiv:2003.03600,2020.
[37]LING H,FU Y,HUA M,et al.An Adaptive Parameter Controlled Ant Colony Optimization Approach for Peer-to-Peer Vehicle and Cargo Matching[J].IEEE Access,2021,9:15764-15777.

相关文章 15

[1]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[3]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[4]	谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[5]	洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[6]	李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[7]	欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[8]	代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法 Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method 计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[9]	成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制 Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast 计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[10]	梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展 Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing 计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095
[11]	王英恺, 王青山. 能量收集无线通信系统中基于强化学习的能量分配策略 Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting 计算机科学, 2021, 48(7): 333-339. https://doi.org/10.11896/jsjkx.201100154
[12]	周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现 Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning 计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
[13]	李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统 DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things 计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021
[14]	吴兰, 王涵, 李斌全. 基于自监督任务最优选择的无监督域自适应方法 Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks 计算机科学, 2021, 48(6A): 357-363. https://doi.org/10.11896/jsjkx.201000030
[15]	许明泽, 韦明辉, 邓霜, 蔡卫. 多模型集成学习在机械钻速预测中的新应用 Application of Multi-model Ensemble Learning in Prediction of Mechanical Drilling Rate 计算机科学, 2021, 48(6A): 619-622. https://doi.org/10.11896/jsjkx.201000070

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed