计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 274-281.doi: 10.11896/jsjkx.200300028
李丽, 郑嘉利, 罗文聪, 全艺璇
LI Li, ZHENG Jia-li, LUO Wen-cong, QUAN Yi-xuan
摘要: 针对在动态射频识别(Radio Frequency Identification,RFID)室内定位环境中,传统的室内定位模型会随着定位目标数量的增加而导致定位误差增大、计算复杂度上升的问题,文中提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)的RFID室内定位算法。该算法将室内定位过程看作马尔可夫决策过程,首先将动作评价与随机动作相结合,然后进一步最大化动作回报值,最后选择最优坐标值。其同时引入剪切概率比,首先将动作限制在一定范围内,交替使用采样后与采样前的新旧动作,然后使用随机梯度对多个时期的动作策略进行小批量更新,并使用评价网络对动作进行评估,最后通过训练得到PPO定位模型。该算法在有效减少定位误差、提高定位效率的同时,具备更快的收敛速度,特别是在处理大量定位目标时,可大大降低计算复杂度。实验结果表明,本文提出的算法与其他的RFID室内定位算法(如 Twin Delayed Deep Deterministic Policy Gradient(TD3),Deep Deterministic Policy Gradient(DDPG),Actor Critic using Kronecker-Factored Trust Region(ACKTR))相比,定位平均误差分别下降了36.361%,30.696%,28.167%,定位稳定性分别提高了46.691%,34.926%,16.911%,计算复杂度分别降低了84.782%7,70.213%,63.158%。
中图分类号:
[1]FENG Z,KAISER T.Localization with RFID[M].New York:John Wiley & Sons,Ltd.,2016:220-248. [2]CHOI J S,LEE H,ENGELS D W,et al.Passive UHF RFID-Based Localization Using Detection of Tag Interference on Smart Shelf [J].IEEE Transactions on Systems,Man and Cybernetics,Part C(Applications and Reviews),2012,42(2):268-275. [3]MUGAHID O,YUNT G.Indoor distance estimation for passive UHF RFID tag based on RSSI and RCS [J].Measurement,2018,127(10):425-430. [4]METTES P,GEMERT J C V,SNOEK C G M.Spot On:Action Localization from Pointly-Supervised Proposals[C]//European Conference on Computer Vision.2016:437-453. [5]HAN K,CHO S H.Advanced LANDMARC with adaptivek-nearest algorithm for RFID location system[C]//2010 2nd IEEE International Conference on Network Infrastructure and Digital Content.Beijing,China:IEEE,2010:595-598. [6]CHAN M,ZHANG X.Experiments for Leveled RFID Localization for Indoor Stationary Objects[C]//2014-11th International Conference on Information Technology:New Generations(ICITNG’14).Las Vegas,NV,USA:IEEE,2014:1-7. [7]ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm with Backscatter Information of Passive UHF RFID Tags[J].IEEE Sensors Journal,2016,17(99):1-9. [8]BERZ E L,TESCH D A,HESSEL F P.Machine-learning-based system for multi-sensor 3D localization of stationary objects[J].IET Cyber-Physical Systems:Theory & Applications,2018,3(2):81-88. [9]JAEHYUN Y,KIM H.Target Localization in Wireless Sensor Networks Using Online Semi-Supervised Support Vector Regression[J].Sensors,2015,15(6):12539-12559. [10]WU G S,TSENG P H.A Deep Neural Network-Based Indoor Positioning Method using Channel State Information[C]//2018 International Conference on Computing,Networking and Communications(ICNC).Maui,HI,USA:IEEE Computer Society,2018. [11]LIU K,ZHANG W,ZHANG W D,et al.A Wireless Positioning Method Based on Deep Neural Network[J].Computer Engineering,2016,42(7):82-85. [12]SUTTON R,BARTO A.Reinforcement Learning:An Introduction(second edition)[M].Cambridge:MIT Press,2018:1-50. [13]LILLICRAP T,HUNT P,PRITZEL J,et al.Continuous controlwith deep reinforcement learning[J].arXiv:1509.02971,2015. [14]YU H W,ELMAN M,SHUN L,et al.Scalable trust-regionmethod for deep reinforcement learning using Kronecker-factored approximation[J].arXiv:1708.05144,2017. [15]SCOTT F,HERKE V H,DAVID M.Addressing Function Approximation Error in Actorcritic methods[J].arXiv:1802.09477v3,2018. [16]JOHN S,LEVINE S,MORITZ P,et al.Trust Region Policy Optimization[J].Computer Science,2015(3):1889-1897. [17]MARTI'N A,ISARD M,MURRAY D G.A computational modelfor TensorFlow:an introduction[C]//ACM Sigplan International Workshop on Machine Learning and Programming Languages.Barcelona,Spain:ACM,2017:1-7. [18]ABADI M.TensorFlow:learning functions at scale[J].AcmSigplan Notices,2016,51(9):1. [19]ZHAI X,ALI A A S,AMIRA A,et al.MLP Neural NetworkBased Gas Classification System on Zynq SoC[J].IEEE Access,2017,4(99):8138-8146. [20]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014. [21]ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm with Backscatter Information of Passive UHF RFID Tags [J].IEEE Sensors Journal,2016,17(99):1-9. [22]MUGAHID O,YUN T G.Indoor distance estimation for passive UHF RFID tag based on RSSI and RCS[J].Measurement,2018,127(10):425-430. |
[1] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[2] | 邵子灏, 杨世宇, 马国杰. 室内信息服务的基础——低成本定位技术研究综述 Foundation of Indoor Information Services:A Survey of Low-cost Localization Techniques 计算机科学, 2022, 49(9): 228-235. https://doi.org/10.11896/jsjkx.210900260 |
[3] | 唐清华, 王玫, 唐超尘, 刘鑫, 梁雯. 基于M2M相遇区的PDR室内定位方法 PDR Indoor Positioning Method Based on M2M Encounter Region 计算机科学, 2022, 49(9): 283-287. https://doi.org/10.11896/jsjkx.210800270 |
[4] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[5] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[6] | 周楚霖, 陈敬东, 黄凡. 基于无迹粒子滤波的WiFi-PDR融合室内定位技术 WiFi-PDR Fusion Indoor Positioning Technology Based on Unscented Particle Filter 计算机科学, 2022, 49(6A): 606-611. https://doi.org/10.11896/jsjkx.210700108 |
[7] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[8] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[9] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
[10] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 |
[11] | 代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法 Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method 计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084 |
[12] | 成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制 Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast 计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078 |
[13] | 罗文聪, 郑嘉利, 全艺璇, 谢孝德, 林子涵. 基于改进型多目标樽海鞘群算法的RFID阅读器天线优化部署 Optimized Deployment of RFID Reader Antenna Based on Improved Multi-objective Salp Swarm Algorithm 计算机科学, 2021, 48(9): 292-297. https://doi.org/10.11896/jsjkx.200700167 |
[14] | 段雯, 周良. 基于动态附加布隆过滤器的RFID数据冗余处理算法 Redundant RFID Data Removing Algorithm Based on Dynamic-additional Bloom Filter 计算机科学, 2021, 48(8): 41-46. https://doi.org/10.11896/jsjkx.200700093 |
[15] | 梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展 Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing 计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095 |
|