Computer Science ›› 2021, Vol. 48 ›› Issue (4): 274-281.doi: 10.11896/jsjkx.200300028

• Computer Network • Previous Articles     Next Articles

RFID Indoor Positioning Algorithm Based on Proximal Policy Optimization

LI Li, ZHENG Jia-li, LUO Wen-cong, QUAN Yi-xuan   

  1. School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China
    Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China
  • Received:2020-06-24 Revised:2020-06-29 Online:2021-04-15 Published:2021-04-09
  • About author:LI Li,born in 1994,postgraduate.Her main research interests include information processing,communication networks,reinforcement learning and Internet of things.(1114235262@qq.com)
    ZHENG Jia-li,born in 1979,professor.His main research interests include Internet of things,RFID and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61761004) and Natural Science Foundation of Guangxi Province,China(2019GXNSFAA245045).

Abstract: In the Radio Frequency Identification(RFID) dynamic indoor positioning environment,the positioning error and the computing complexity of traditional indoor positioning model will increase with the increase of the number of positioning targets.This paper proposes an RFID positioning algorithm based on Proximal Policy Optimization(PPO),which regards the positioning as Markov decision-making process.Firstly,the action evalution is combined with random action and the return of the action is then maximized to select the best coordinate value.Meanwhile,under the premise of limiting the action to a certain range,the algorithm introduces clipped probability ratios,using post-sample and pre-sample action alternatesly,then,with stochastic gradient ascent updates multiple epochs policy of minibatch and with the critic network evaluate the action.Finally,the PPO positioning model is obtained by training.This method can effectively reduce the positioning error and improve the positioning efficiency.At the same time,it has a faster convergence speed,especially when dealing with a large number of positioning targets,it can greatly reduce the computational complexity.Experiment results show that,compared with other RFID indoor positioning algorithms,such as Twin Delayed Deep Deterministic policy gradient(TD3),Deep Deterministic Policy Gradient(DDPG) and actor-critic using Kronecker-Factored Trust Region(ACKTR),the average positioning error of the proposed method decreases respectively by 36.361%,30.696% and 28.167%,the positioning stability improves by 46.691%,34.926% and 16.911%,and the computing complexity decreases respectively by 84.782%,70.213% and 63.158%.

Key words: Clipped probability ratios, Deep reinforcement learning, Indoor positioning, RFID

CLC Number: 

  • TP301.6
[1]FENG Z,KAISER T.Localization with RFID[M].New York:John Wiley & Sons,Ltd.,2016:220-248.
[2]CHOI J S,LEE H,ENGELS D W,et al.Passive UHF RFID-Based Localization Using Detection of Tag Interference on Smart Shelf [J].IEEE Transactions on Systems,Man and Cybernetics,Part C(Applications and Reviews),2012,42(2):268-275.
[3]MUGAHID O,YUNT G.Indoor distance estimation for passive UHF RFID tag based on RSSI and RCS [J].Measurement,2018,127(10):425-430.
[4]METTES P,GEMERT J C V,SNOEK C G M.Spot On:Action Localization from Pointly-Supervised Proposals[C]//European Conference on Computer Vision.2016:437-453.
[5]HAN K,CHO S H.Advanced LANDMARC with adaptivek-nearest algorithm for RFID location system[C]//2010 2nd IEEE International Conference on Network Infrastructure and Digital Content.Beijing,China:IEEE,2010:595-598.
[6]CHAN M,ZHANG X.Experiments for Leveled RFID Localization for Indoor Stationary Objects[C]//2014-11th International Conference on Information Technology:New Generations(ICITNG’14).Las Vegas,NV,USA:IEEE,2014:1-7.
[7]ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm with Backscatter Information of Passive UHF RFID Tags[J].IEEE Sensors Journal,2016,17(99):1-9.
[8]BERZ E L,TESCH D A,HESSEL F P.Machine-learning-based system for multi-sensor 3D localization of stationary objects[J].IET Cyber-Physical Systems:Theory & Applications,2018,3(2):81-88.
[9]JAEHYUN Y,KIM H.Target Localization in Wireless Sensor Networks Using Online Semi-Supervised Support Vector Regression[J].Sensors,2015,15(6):12539-12559.
[10]WU G S,TSENG P H.A Deep Neural Network-Based Indoor Positioning Method using Channel State Information[C]//2018 International Conference on Computing,Networking and Communications(ICNC).Maui,HI,USA:IEEE Computer Society,2018.
[11]LIU K,ZHANG W,ZHANG W D,et al.A Wireless Positioning Method Based on Deep Neural Network[J].Computer Engineering,2016,42(7):82-85.
[12]SUTTON R,BARTO A.Reinforcement Learning:An Introduction(second edition)[M].Cambridge:MIT Press,2018:1-50.
[13]LILLICRAP T,HUNT P,PRITZEL J,et al.Continuous controlwith deep reinforcement learning[J].arXiv:1509.02971,2015.
[14]YU H W,ELMAN M,SHUN L,et al.Scalable trust-regionmethod for deep reinforcement learning using Kronecker-factored approximation[J].arXiv:1708.05144,2017.
[15]SCOTT F,HERKE V H,DAVID M.Addressing Function Approximation Error in Actorcritic methods[J].arXiv:1802.09477v3,2018.
[16]JOHN S,LEVINE S,MORITZ P,et al.Trust Region Policy Optimization[J].Computer Science,2015(3):1889-1897.
[17]MARTI'N A,ISARD M,MURRAY D G.A computational modelfor TensorFlow:an introduction[C]//ACM Sigplan International Workshop on Machine Learning and Programming Languages.Barcelona,Spain:ACM,2017:1-7.
[18]ABADI M.TensorFlow:learning functions at scale[J].AcmSigplan Notices,2016,51(9):1.
[19]ZHAI X,ALI A A S,AMIRA A,et al.MLP Neural NetworkBased Gas Classification System on Zynq SoC[J].IEEE Access,2017,4(99):8138-8146.
[20]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[21]ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm with Backscatter Information of Passive UHF RFID Tags [J].IEEE Sensors Journal,2016,17(99):1-9.
[22]MUGAHID O,YUN T G.Indoor distance estimation for passive UHF RFID tag based on RSSI and RCS[J].Measurement,2018,127(10):425-430.
[1] TANG Qing-hua, WANG Mei, TANG Chao-chen, LIU Xin, LIANG Wen. PDR Indoor Positioning Method Based on M2M Encounter Region [J]. Computer Science, 2022, 49(9): 283-287.
[2] YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[3] LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[4] ZHOU Chu-lin, CHEN Jing-dong, HUANG Fan. WiFi-PDR Fusion Indoor Positioning Technology Based on Unscented Particle Filter [J]. Computer Science, 2022, 49(6A): 606-611.
[5] XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[6] HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[7] LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[8] OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[9] DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243.
[10] CHENG Zhao-wei, SHEN Hang, WANG Yue, WANG Min, BAI Guang-wei. Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast [J]. Computer Science, 2021, 48(9): 271-277.
[11] LUO Wen-cong, ZHENG Jia-li, QUAN Yi-xuan, XIE Xiao-de, LIN Zi-han. Optimized Deployment of RFID Reader Antenna Based on Improved Multi-objective Salp Swarm Algorithm [J]. Computer Science, 2021, 48(9): 292-297.
[12] DUAN Wen, ZHOU Liang. Redundant RFID Data Removing Algorithm Based on Dynamic-additional Bloom Filter [J]. Computer Science, 2021, 48(8): 41-46.
[13] LIANG Jun-bin, ZHANG Hai-han, JIANG Chan, WANG Tian-shu. Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing [J]. Computer Science, 2021, 48(7): 316-323.
[14] WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339.
[15] ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju. Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning [J]. Computer Science, 2021, 48(7): 40-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!