Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 210800257-9.doi: 10.11896/jsjkx.210800257

• Artificial Intelligence • Previous Articles     Next Articles

Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network

CAI Yue, WANG En-liang, SUN Zhe, SUN Zhi-xin   

  1. Post Big Data Technology and Application Engineering Research Center of Jiangsu Province,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    Post Industry Technology Research and Development Center of the State Posts Bureau(Internet of Things Technology),Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:CAI Yue,born in 1997,postgraduate.His main research interests include information networks,reinforcement learning,and deep neural networks.
    SUN Zhi-xin,born in 1964,doctor,professor,doctoral supervisor.His main research interests include the theory and technology of network communication,computer network and security.
  • Supported by:
    National Natural Science Foundation of China(61972208).

Abstract: Due to the uneven utilization of road transportation resources in my country,the supply and demand of trucks and cargo become a hot issue today.In order to maximize the utilization of overall transportation resources,the truck and cargo supply-demand matching platform needs to integrate transportation demand and capacity,reduce costs and improve efficiency.The algorithms used by most platforms are usually heuristic algorithms to solve the problem of trucks-cargo matching.Such algorithms have a bottleneck in optimizing when faced with large-scale problems.In response to the above-mentioned problems,this paper transforms the supply-demand matching problem of vehicles and goods into a double sequence decision-making problem for the first time.Based on this,we study an efficient algorithm that is suitable for today’s vehicle and goods supply-demand matching links.First,a mathematical model of trucks-cargo matching is proposed and the model is abstracted as a double sequence decision problem,and then a dual-pointer-network algorithm is innovatively proposed to solve this problem.The experiment uses the Actor-Critic algorithm as the model training framework to train the dual-pointer-network and evaluates the model.Experiments show that the dual-pointer-network’s vehicle-to-cargo matching solution method is equivalent to traditional heuristic algorithms in small problem scales,and surpasses heuristic algorithms in large problem scales.At the same time,the time consumption is greatlyreduced.

Key words: Dual pointer network, Double sequence decision-making problem, Deep reinforcement learning, Combinatorial optimization, Trucks and cargo matching, Critic network

CLC Number: 

  • TP311
[1]WANG C,NI Y,YANG X.The Production Routing Problem Under Uncertain Environment[J].IEEE Access,2021,PP(99):1-1.
[2]ODILI J B.Combinatorial optimization in science and enginee-ring[J].Current science,2017,113(12):2268-2274.
[3]MARATHE M V,PERCUS A G,TORNEY D C.Combinatorial Optimization in Biology[Z].1999.
[4]CHEN C,LI C.Process Synthesis and Design Problems Based on a Global Particle Swarm Optimization Algorithm[J].IEEE Access,2021,PP(99):1-1.
[5]MELNIKOV,TSYGANOV,BULYCHOV.A Multi-heuristicAlgorithmic Skeleton for Hard Combinatorial Optimization Problems[C]//2nd International Joint Conference on Computational Sciences and Optimization(CSO).2009.
[6]GREBENNIK I,DUPAS R,URNIAIEVA I,et al.Mathematical Model of Containers Placement in Rail Terminal Operations Problem[C]//International Conference on Advanced Computer Information Technologies.Dept.of System Engineering,Kharkiv National University of Radioelectronics,Kharkiv,Ukraine;Laboratory IMS,University of Bordeaux,Bordeaux,France;Dept.of System Engineering,Kharkiv National University of Radioelectronics,Kharkiv,Ukraine;Dept.of,2019.
[7]SAHANA S K,JAIN A.High Performance Ant Colony Optimizer(HPACO) for Travelling Salesman Problem(TSP)[M].Springer International Publishing,2014.
[8]KALAKANTI A K,VERMA S,PAUL T,et al.RL SolVeRPro:Reinforcement Learning for Solving Vehicle Routing Problem[C]//2019 1st International Conference on Artificial Intelligence and Data Sciences(AiDAS).2019.
[9]HIFIM,YOUSSOUF A M,SAADI T,et al.A CooperativeSwarm Optimization-Based Algorithm for the Quadratic Multiple Knapsack Problem[C]//2020 7th International Conference on Control,Decision and Information Technologies(CoDIT).2020.
[10]MUNIEN C,MAHABEER S,DZITIRO E,et al.Metaheuristic Approaches for One-Dimensional Bin Packing Problem:A Comparative Performance Study[J].IEEE Access,2020,PP(99).
[11]CHEN B H.A variable neighborhood search algorithm with reinforcement learning for a real-life periodic vehicle routing problem with time windows and open routes[J].RAIRO-Operations Research,2020,54(5):1467-1494.
[12]HAO X.Optimization Models and Heuristic Method Based on Simulated Annealing Strategy for Traveling Salesman Problem[J].Applied Mechanics and Materials,2010,34-35(4):1180-1184.
[13]OUYANG A J,ZHOU Y Q.An improved PSO-ACO algorithm for solving large-scale TSP[J].Advanced Materials Research,2011,143-144:1154-1158.
[14]BAI D L,GUO Q P.An Optimized Genetic Algorithm for TSP[C]//Proceedings of 2008 International Symposium on Distributed Computing and Applications for Business Engineering and Science.2008.
[15]KAYSTHA S,AGARWAL S.Greedy genetic algorithm toBounded Knapsack Problem[C]//IEEE International Confe-rence on Computer Science & Information Technology.IEEE,2010:301-305.
[16]BELLO I,PHAM H,LE Q V,et al.Neural combinatorial optimization with reinforcement learning[J].arXiv:1611.09940,2016.
[17]GUO J N.Vehicle-Cargo Matching Using a Fuzzy Group Decision-Making Approach[J].Journal of Transportation Engineering and Information,2017,15(4):141-146.
[18]LI H.Research on Supply and Demand Matching of Vehiclesand Cargos Based on Stowage Logistics Information Service Platform[D].Beijing:Beijing Jiaotong University,2015.
[19]XIONG Y Q.LogisticsPublic Information Platform Freight Forwarding Matching and Credibility Motivation Mechanism[D].Beijing:Tsinghua University,2015.
[20]WU G S.Research on Matching Model of Vehicle Cargo Consi-dering Trading Party’s Preference[D].Nanjing:Nanjing University,2017.
[21]HU J L,BING C,HAN S G.Study on vehicles and goods ma-tching of arterial road freight platform based on TS algorithm[J].Journal of Zhejiang Sci-Tech University(Social Science Edition),2018,40(5):478-486.
[22]ZHANG Q J.Research on Combination Matching Model Based on Logistics Supply and Demand Information[D].Xi’an:Xidian University,2017.
[23]MU X W,CHEN Y,GAO S J,et al.Vehicleand Cargo Matching Method Based on Improved Quantum Evolutionary Algorithm[J].Chinese Journal of Management Science,2016,24(12):166-176.
[24]ZHAO C Y.Research on Vehicles and Cargos Matching Problem for Virtual Social Reserve Platform in M area[D].Beijing:Beijing Jiaotong University,2019.
[25]YU Y S,LIU X Y.Research on Vehicles and Cargos Matching Based on Improved Balance Algorithm [J].Journal of Wuhan University of Technology,2016,38(10):47-54.
[26]AFFAN M,JAWAID J,AHMED S U,et al.Solving Combinatorial Problems through Off-Policy Reinforcement Learning Methods[C]//2020 International Conference on Electrical,Communication,and Computer Engineering(ICECCE).2020.
[27]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning.PMLR,2018:1861-1870.
[28]VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[J].arXiv:1506.03134,2015.
[29]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[J].arXiv:1409.3215,2014.
[30]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[31]LEIBA J,KIROS J R,HINTON G E.Layer normalization[J].arXiv:1607.06450,2016.
[32]HE K M,ZHANG X Y,REN S Q,et al..Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[33]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[34]SUN F,JIANG P,SUN H,et al.Multi-source pointer network for product title summarization[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management.2018:7-16.
[35]SUTTON R S,MCALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]//NIPs.1999:1057-1063.
[36]MAZYAVKINA N,SVIRIDOV S,IVANOV S,et al.Reinforcement learning for combinatorial optimization:A survey[J].arXiv:2003.03600,2020.
[37]LING H,FU Y,HUA M,et al.An Adaptive Parameter Controlled Ant Colony Optimization Approach for Peer-to-Peer Vehicle and Cargo Matching[J].IEEE Access,2021,9:15764-15777.
[1] YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[2] LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[3] XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[4] HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[5] LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[6] OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[7] DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243.
[8] CHENG Zhao-wei, SHEN Hang, WANG Yue, WANG Min, BAI Guang-wei. Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast [J]. Computer Science, 2021, 48(9): 271-277.
[9] ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju. Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning [J]. Computer Science, 2021, 48(7): 40-46.
[10] LIANG Jun-bin, ZHANG Hai-han, JIANG Chan, WANG Tian-shu. Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing [J]. Computer Science, 2021, 48(7): 316-323.
[11] WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339.
[12] LI Bei-bei, SONG Jia-rui, DU Qing-yun, HE Jun-jiang. DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things [J]. Computer Science, 2021, 48(7): 47-54.
[13] FAN Jia-kuan, WANG Hao-yue, ZHAO Sheng-yu, ZHOU Tian-yi, WANG Wei. Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions [J]. Computer Science, 2021, 48(5): 45-50.
[14] FAN Yan-fang, YUAN Shuang, CAI Ying, CHEN Ruo-yu. Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing [J]. Computer Science, 2021, 48(5): 270-276.
[15] HUANG Zhi-yong, WU Hao-lin, WANG Zhuang, LI Hui. DQN Algorithm Based on Averaged Neural Network Parameters [J]. Computer Science, 2021, 48(4): 223-228.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!