基于异步优势动作评价的RFID室内定位算法

doi:10.11896/jsjkx.190100070

计算机科学 ›› 2020, Vol. 47 ›› Issue (2): 233-238.doi: 10.11896/jsjkx.190100070

基于异步优势动作评价的RFID室内定位算法

李丽,郑嘉利,王哲,袁源,石静

(广西大学计算机与电子信息学院南宁530004)¹;
(广西多媒体通信与网络技术重点实验室南宁530004)²

收稿日期:2019-01-10 出版日期:2020-02-15 发布日期:2020-03-18
通讯作者: 郑嘉利(zjl@gxu.edu.cn)
基金资助:
国家自然科学基金(61761004)

RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic

LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing

(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)¹;
(Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China)²

Received:2019-01-10 Online:2020-02-15 Published:2020-03-18
About author:LI Li,born in 1994,postgraduate.Her main research interests include information processing and communication networks,reinforcement learning and internet of things;ZHENG Jia-li,born in 1979,professor.His main research interests include internet of things,RFID and artificial intelligence.
Supported by:
This work was supported by the National Natural Science Foundation of China (61761004).

摘要/Abstract

摘要： 针对现有的RFID室内定位算法的精度容易受到环境因素影响的问题,提出了一种基于异步优势动作评价(Asynchronous Advantage Actor-critic,A3C)的RFID室内定位算法。该算法的主要步骤为:1)将RFID的信号强度RSSI值作为输入值,多个线程子动作网络并行交互采样学习,利用子评价网络评价动作值的优劣,使模型不断优化,找到最优信号强度RSSI值,并训练定位模型;子线程网络定期将网络参数异步更新到全局网络上,全局网络最后输出参考标签的具体位置,同时训练得到异步优势动作评价定位模型。2)在线定位阶段,当待测目标进入待测区域时,记录待测目标的信号强度RSSI值,将其输入异步优势动作评价定位模型中,子线程网络从全局网络中获取最新定位信息,对待测目标进行定位,最后输出目标的具体位置。实验数据表明,基于异步优势动作评价的RFID室内定位算法与传统的基于向量机(Support Vector Machines,SVM)定位、基于极限学习机(Extreme Learning Machine,ELM)定位、基于多层神经网络定位(Multi-Layer Perceptron,MLP)的RFID室内定位算法相比,定位平均误差分别下降了66.114%,50.316%,44.494%;定位稳定性分别平均提高了59.733%,53.083%,43.748%。实验结果表明,基于异步优势动作评价的RFID室内定位算法在处理大量室内定位目标时具有较好的定位性能。

关键词: RFID, RSSI, 强化学习, 室内定位, 异步优势动作评价

Abstract: In view of the fact that the accuracy of existing RFID indoor positioning algorithm is easily affected by environment factors and the robustness is not strong,this paper proposed an RFID indoor positioning algorithm based on asynchronous advantage actor-critic (A3C).The main steps of the algorithm are as follows.Firstly,the RSSI value of RFID signal strength is used as the input value.The multi-thread sub-action network parallel interactive sampling learning,and the sub-evaluation network evaluates the advantage and disadvantage of the action value,so that the model is continuously optimized to find the best signal strength RSSI and trains the positioning model.The sub-thread network updates the network parameters to the global network on a regular basis,and the global network finally outputs the specific location of the reference tag,at the same time the asynchronous advantage actor-critic positioning model is trained.Secondly,in the online positioning stage,when the target to be tested enters the area to be tested,the signal strength RSSI value of the object to be tested is recorded and input into the asynchronous advantage actor-critic positioning model.The sub-thread network obtains the latest positioning information from the global network,locates the side target,and finally outputs the specific position of the target.RFID indoor positioning algorithm based on asynchronous advantage actor-critic was compared with the traditional RFID indoor positioning algorithm based on Support Vector Machines (SVM) positioning,Extreme Learning Machine (ELM) positioning,and Multi-Layer Perceptron positioning (MLP).Experiment results show that the mean positioning error of the proposed algorithm is respectively decreased by 66.114%,50.316% and 44.494%; the average positioning stability is respectively increased by 59.733%,53.083% and 43.748%.The experiment results show that the proposed RFID indoor positioning algorithm based on asynchronous advantage actor-critic has better positioning performance when dealing with a large number of indoor positioning targets.

Key words: Asynchronous advantage actor-critic, Indoor positioning, Reinforcement learning, RFID, RSSI

中图分类号:

TP301.6

李丽,郑嘉利,王哲,袁源,石静. 基于异步优势动作评价的RFID室内定位算法[J]. 计算机科学, 2020, 47(2): 233-238. https://doi.org/10.11896/jsjkx.190100070

LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing. RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic[J]. Computer Science, 2020, 47(2): 233-238. https://doi.org/10.11896/jsjkx.190100070

参考文献

[1]SHI J Y,QIN X L,WANG L.Gradient and Constant-game Based RFID Indoor Localization Algorithm[J].ComputerScience,2015,42(11):138-143.
[2]ZHENG J,YANG Y,HE X,et al.Multiple-port reader antenna with three modes for UHF RFID applications[J].Electronics Letters,2018,54(5):264-266.
[3]LIU K,ZHANG W,ZHANG W D,et al.A Wireless Positioning Method Based on Deep Neural Network[J].Computer Engineering,2016,42(7):82-85.
[4]YANG Y N,XIA B,YUAN W,et al.Research on Ranging Algorithm Based on Convolution Neural Network[J].Journal of Chongqing University of Technology(Natural Science),2018(3):172-177.
[5]WANG C,WU F,SHI Z,et al.Indoor positioning technique by combining RFID and particle swarm optimization-based back propagation neural network[J].Optik - International Journal for Light and Electron Optics,2016,127(17):6839-6849.
[6]WANG C,SHI Z,WU F,et al.An RFID indoor positioning system by using Particle Swarm Optimization-based Artificial Neural Network[C]∥2016 International Conference on Audio.Language and Image Processing(ICALIP).IEEE Computer Society,2017:738-742.
[7]KUNG H Y,CHAISIT S,PHUONG N T M.Optimization of an RFID location identification scheme based on the neural network[J].International Journal of Communication Systems,2015,28(4):625-644.
[8]JIANG X,LIU J,CHEN Y,et al.Feature Adaptive Online Sequential Extreme Learning Machine for lifelong indoor localization[J].Neural Computing & Applications,2016,27(1):215-225.
[9]LIU F,ZHONG D.GSOS-ELM:An RFID-Based Indoor Localization System Using GSO Method and Semi-Supervised Online Sequential ELM[J].Sensors,2018,18(7):1995.
[10]GAO Z,MA Y,LIU K,et al.An Indoor Multi-tag Cooperative Localization Algorithm Based on NMDS for RFID[J].IEEE Sensors Journal,2017,17(7):2120-2128.
[11] ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm With Backscatter Information of Passive UHF RFID Tags[J].IEEE Sensors Journal,2016,17(99):1-1.
[12]SUTTON R,BARTO A.Reinforcement Learning:An Introduction(second edition)[M].The MIT Press,2018.
[13]MURRAY D G,MURRAY D G.A computational model for TensorFlow:an introduction[C]∥Proceesings of the 1^st ACM SIGPLAN International Workshop on Machine Learning and Programming Language.New York:ACM,2017:1-7.
[14]ABADI M.TensorFlow:learning functions at scale[J].Acm Sigplan Notices,2016,51(9):1-1.
[15]SCHMIDHUBER J.Deep learning in neural networks:An overview[J].Neural Network,2015,61(5):85-117.
[16]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[17]SONG R,LEWIS F,WEI Q,et al.Multiple actor-critic struc-tures for continuous-time optimal control using input-output data[J].IEEE Transactions on Neural Networks and Learning Systems,2015,26(4):851-865.
[18]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous Methods for Deep Reinforcement Learning[J].arXiv:1602.01783v2,2016.
[19]BURTON A,PARIKH T,MASCARENHAS S,et al.Driver identification and authentication with active behavior modeling[C]∥12^th International Conference on Network and Service Management(CNSM).IEEE Computer Society,2017:388-393.
[20]ALARIFI A,ALSALMAN A M,ALSALEH M,et al.Ultra Wideband Indoor Positioning Technologies:Analysis and Recent Advances[J].IEEE Sensors,2016,16(5):1-36.
[21]ZHAI X,ALI A A S,AMIRA A,et al.MLP Neural Network Based Gas Classification System on Zynq SoC[J].IEEE Access,2017,4(99):8138-8146.

相关文章 15

[1]	邵子灏, 杨世宇, 马国杰. 室内信息服务的基础——低成本定位技术研究综述 Foundation of Indoor Information Services:A Survey of Low-cost Localization Techniques 计算机科学, 2022, 49(9): 228-235. https://doi.org/10.11896/jsjkx.210900260
[2]	刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3]	唐清华, 王玫, 唐超尘, 刘鑫, 梁雯. 基于M2M相遇区的PDR室内定位方法 PDR Indoor Positioning Method Based on M2M Encounter Region 计算机科学, 2022, 49(9): 283-287. https://doi.org/10.11896/jsjkx.210800270
[4]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5]	史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[6]	袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[7]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[8]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[9]	周楚霖, 陈敬东, 黄凡. 基于无迹粒子滤波的WiFi-PDR融合室内定位技术 WiFi-PDR Fusion Indoor Positioning Technology Based on Unscented Particle Filter 计算机科学, 2022, 49(6A): 606-611. https://doi.org/10.11896/jsjkx.210700108
[10]	谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[11]	洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12]	郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型 Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[13]	范静宇, 刘全. 基于随机加权三重Q学习的异策略最大熵强化学习算法 Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning 计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[14]	张佳能, 李辉, 吴昊霖, 王壮. 一种平衡探索和利用的优先经验回放方法 Exploration and Exploitation Balanced Experience Replay 计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[15]	李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于异步优势动作评价的RFID室内定位算法

RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0