计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 274-281.doi: 10.11896/jsjkx.200300028

• 计算机网络 • 上一篇    下一篇

基于近端策略优化的RFID室内定位算法

李丽, 郑嘉利, 罗文聪, 全艺璇   

  1. 广西大学计算机与电子信息学院 南宁530004; 广西多媒体通信与网络技术重点实验室 南宁530004
  • 收稿日期:2020-06-24 修回日期:2020-06-29 出版日期:2021-04-15 发布日期:2021-04-09
  • 通讯作者: 郑嘉利(zjl@gxu.edu.cn)
  • 基金资助:
    国家自然科学基金(61761004);广西自然科学基金(2019GXNSFAA245045)

RFID Indoor Positioning Algorithm Based on Proximal Policy Optimization

LI Li, ZHENG Jia-li, LUO Wen-cong, QUAN Yi-xuan   

  1. School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China
    Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China
  • Received:2020-06-24 Revised:2020-06-29 Online:2021-04-15 Published:2021-04-09
  • About author:LI Li,born in 1994,postgraduate.Her main research interests include information processing,communication networks,reinforcement learning and Internet of things.(1114235262@qq.com)
    ZHENG Jia-li,born in 1979,professor.His main research interests include Internet of things,RFID and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61761004) and Natural Science Foundation of Guangxi Province,China(2019GXNSFAA245045).

摘要: 针对在动态射频识别(Radio Frequency Identification,RFID)室内定位环境中,传统的室内定位模型会随着定位目标数量的增加而导致定位误差增大、计算复杂度上升的问题,文中提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)的RFID室内定位算法。该算法将室内定位过程看作马尔可夫决策过程,首先将动作评价与随机动作相结合,然后进一步最大化动作回报值,最后选择最优坐标值。其同时引入剪切概率比,首先将动作限制在一定范围内,交替使用采样后与采样前的新旧动作,然后使用随机梯度对多个时期的动作策略进行小批量更新,并使用评价网络对动作进行评估,最后通过训练得到PPO定位模型。该算法在有效减少定位误差、提高定位效率的同时,具备更快的收敛速度,特别是在处理大量定位目标时,可大大降低计算复杂度。实验结果表明,本文提出的算法与其他的RFID室内定位算法(如 Twin Delayed Deep Deterministic Policy Gradient(TD3),Deep Deterministic Policy Gradient(DDPG),Actor Critic using Kronecker-Factored Trust Region(ACKTR))相比,定位平均误差分别下降了36.361%,30.696%,28.167%,定位稳定性分别提高了46.691%,34.926%,16.911%,计算复杂度分别降低了84.782%7,70.213%,63.158%。

关键词: RFID, 剪切概率比, 深度强化学习, 室内定位

Abstract: In the Radio Frequency Identification(RFID) dynamic indoor positioning environment,the positioning error and the computing complexity of traditional indoor positioning model will increase with the increase of the number of positioning targets.This paper proposes an RFID positioning algorithm based on Proximal Policy Optimization(PPO),which regards the positioning as Markov decision-making process.Firstly,the action evalution is combined with random action and the return of the action is then maximized to select the best coordinate value.Meanwhile,under the premise of limiting the action to a certain range,the algorithm introduces clipped probability ratios,using post-sample and pre-sample action alternatesly,then,with stochastic gradient ascent updates multiple epochs policy of minibatch and with the critic network evaluate the action.Finally,the PPO positioning model is obtained by training.This method can effectively reduce the positioning error and improve the positioning efficiency.At the same time,it has a faster convergence speed,especially when dealing with a large number of positioning targets,it can greatly reduce the computational complexity.Experiment results show that,compared with other RFID indoor positioning algorithms,such as Twin Delayed Deep Deterministic policy gradient(TD3),Deep Deterministic Policy Gradient(DDPG) and actor-critic using Kronecker-Factored Trust Region(ACKTR),the average positioning error of the proposed method decreases respectively by 36.361%,30.696% and 28.167%,the positioning stability improves by 46.691%,34.926% and 16.911%,and the computing complexity decreases respectively by 84.782%,70.213% and 63.158%.

Key words: Clipped probability ratios, Deep reinforcement learning, Indoor positioning, RFID

中图分类号: 

  • TP301.6
[1]FENG Z,KAISER T.Localization with RFID[M].New York:John Wiley & Sons,Ltd.,2016:220-248.
[2]CHOI J S,LEE H,ENGELS D W,et al.Passive UHF RFID-Based Localization Using Detection of Tag Interference on Smart Shelf [J].IEEE Transactions on Systems,Man and Cybernetics,Part C(Applications and Reviews),2012,42(2):268-275.
[3]MUGAHID O,YUNT G.Indoor distance estimation for passive UHF RFID tag based on RSSI and RCS [J].Measurement,2018,127(10):425-430.
[4]METTES P,GEMERT J C V,SNOEK C G M.Spot On:Action Localization from Pointly-Supervised Proposals[C]//European Conference on Computer Vision.2016:437-453.
[5]HAN K,CHO S H.Advanced LANDMARC with adaptivek-nearest algorithm for RFID location system[C]//2010 2nd IEEE International Conference on Network Infrastructure and Digital Content.Beijing,China:IEEE,2010:595-598.
[6]CHAN M,ZHANG X.Experiments for Leveled RFID Localization for Indoor Stationary Objects[C]//2014-11th International Conference on Information Technology:New Generations(ICITNG’14).Las Vegas,NV,USA:IEEE,2014:1-7.
[7]ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm with Backscatter Information of Passive UHF RFID Tags[J].IEEE Sensors Journal,2016,17(99):1-9.
[8]BERZ E L,TESCH D A,HESSEL F P.Machine-learning-based system for multi-sensor 3D localization of stationary objects[J].IET Cyber-Physical Systems:Theory & Applications,2018,3(2):81-88.
[9]JAEHYUN Y,KIM H.Target Localization in Wireless Sensor Networks Using Online Semi-Supervised Support Vector Regression[J].Sensors,2015,15(6):12539-12559.
[10]WU G S,TSENG P H.A Deep Neural Network-Based Indoor Positioning Method using Channel State Information[C]//2018 International Conference on Computing,Networking and Communications(ICNC).Maui,HI,USA:IEEE Computer Society,2018.
[11]LIU K,ZHANG W,ZHANG W D,et al.A Wireless Positioning Method Based on Deep Neural Network[J].Computer Engineering,2016,42(7):82-85.
[12]SUTTON R,BARTO A.Reinforcement Learning:An Introduction(second edition)[M].Cambridge:MIT Press,2018:1-50.
[13]LILLICRAP T,HUNT P,PRITZEL J,et al.Continuous controlwith deep reinforcement learning[J].arXiv:1509.02971,2015.
[14]YU H W,ELMAN M,SHUN L,et al.Scalable trust-regionmethod for deep reinforcement learning using Kronecker-factored approximation[J].arXiv:1708.05144,2017.
[15]SCOTT F,HERKE V H,DAVID M.Addressing Function Approximation Error in Actorcritic methods[J].arXiv:1802.09477v3,2018.
[16]JOHN S,LEVINE S,MORITZ P,et al.Trust Region Policy Optimization[J].Computer Science,2015(3):1889-1897.
[17]MARTI'N A,ISARD M,MURRAY D G.A computational modelfor TensorFlow:an introduction[C]//ACM Sigplan International Workshop on Machine Learning and Programming Languages.Barcelona,Spain:ACM,2017:1-7.
[18]ABADI M.TensorFlow:learning functions at scale[J].AcmSigplan Notices,2016,51(9):1.
[19]ZHAI X,ALI A A S,AMIRA A,et al.MLP Neural NetworkBased Gas Classification System on Zynq SoC[J].IEEE Access,2017,4(99):8138-8146.
[20]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[21]ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm with Backscatter Information of Passive UHF RFID Tags [J].IEEE Sensors Journal,2016,17(99):1-9.
[22]MUGAHID O,YUN T G.Indoor distance estimation for passive UHF RFID tag based on RSSI and RCS[J].Measurement,2018,127(10):425-430.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 邵子灏, 杨世宇, 马国杰.
室内信息服务的基础——低成本定位技术研究综述
Foundation of Indoor Information Services:A Survey of Low-cost Localization Techniques
计算机科学, 2022, 49(9): 228-235. https://doi.org/10.11896/jsjkx.210900260
[3] 唐清华, 王玫, 唐超尘, 刘鑫, 梁雯.
基于M2M相遇区的PDR室内定位方法
PDR Indoor Positioning Method Based on M2M Encounter Region
计算机科学, 2022, 49(9): 283-287. https://doi.org/10.11896/jsjkx.210800270
[4] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[5] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[6] 周楚霖, 陈敬东, 黄凡.
基于无迹粒子滤波的WiFi-PDR融合室内定位技术
WiFi-PDR Fusion Indoor Positioning Technology Based on Unscented Particle Filter
计算机科学, 2022, 49(6A): 606-611. https://doi.org/10.11896/jsjkx.210700108
[7] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[8] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[9] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[10] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[11] 代珊珊, 刘全.
基于动作约束深度强化学习的安全自动驾驶方法
Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method
计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[12] 成昭炜, 沈航, 汪悦, 王敏, 白光伟.
基于深度强化学习的无人机辅助弹性视频多播机制
Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast
计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[13] 罗文聪, 郑嘉利, 全艺璇, 谢孝德, 林子涵.
基于改进型多目标樽海鞘群算法的RFID阅读器天线优化部署
Optimized Deployment of RFID Reader Antenna Based on Improved Multi-objective Salp Swarm Algorithm
计算机科学, 2021, 48(9): 292-297. https://doi.org/10.11896/jsjkx.200700167
[14] 段雯, 周良.
基于动态附加布隆过滤器的RFID数据冗余处理算法
Redundant RFID Data Removing Algorithm Based on Dynamic-additional Bloom Filter
计算机科学, 2021, 48(8): 41-46. https://doi.org/10.11896/jsjkx.200700093
[15] 梁俊斌, 张海涵, 蒋婵, 王天舒.
移动边缘计算中基于深度强化学习的任务卸载研究进展
Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing
计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!