计算机科学 ›› 2020, Vol. 47 ›› Issue (2): 233-238.doi: 10.11896/jsjkx.190100070

• 计算机网络 • 上一篇    下一篇



  1. (广西大学计算机与电子信息学院 南宁530004)1;
    (广西多媒体通信与网络技术重点实验室 南宁530004)2
  • 收稿日期:2019-01-10 出版日期:2020-02-15 发布日期:2020-03-18
  • 通讯作者: 郑嘉利(zjl@gxu.edu.cn)
  • 基金资助:

RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic

LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing   

  1. (School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)1;
    (Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China)2
  • Received:2019-01-10 Online:2020-02-15 Published:2020-03-18
  • About author:LI Li,born in 1994,postgraduate.Her main research interests include information processing and communication networks,reinforcement learning and internet of things;ZHENG Jia-li,born in 1979,professor.His main research interests include internet of things,RFID and artificial intelligence.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61761004).

摘要: 针对现有的RFID室内定位算法的精度容易受到环境因素影响的问题,提出了一种基于异步优势动作评价(Asynchronous Advantage Actor-critic,A3C)的RFID室内定位算法。该算法的主要步骤为:1)将RFID的信号强度RSSI值作为输入值,多个线程子动作网络并行交互采样学习,利用子评价网络评价动作值的优劣,使模型不断优化,找到最优信号强度RSSI值,并训练定位模型;子线程网络定期将网络参数异步更新到全局网络上,全局网络最后输出参考标签的具体位置,同时训练得到异步优势动作评价定位模型。2)在线定位阶段,当待测目标进入待测区域时,记录待测目标的信号强度RSSI值,将其输入异步优势动作评价定位模型中,子线程网络从全局网络中获取最新定位信息,对待测目标进行定位,最后输出目标的具体位置。实验数据表明,基于异步优势动作评价的RFID室内定位算法与传统的基于向量机(Support Vector Machines,SVM)定位、基于极限学习机(Extreme Learning Machine,ELM)定位、基于多层神经网络定位(Multi-Layer Perceptron,MLP)的RFID室内定位算法相比,定位平均误差分别下降了66.114%,50.316%,44.494%;定位稳定性分别平均提高了59.733%,53.083%,43.748%。实验结果表明,基于异步优势动作评价的RFID室内定位算法在处理大量室内定位目标时具有较好的定位性能。

关键词: RFID, RSSI, 强化学习, 室内定位, 异步优势动作评价

Abstract: In view of the fact that the accuracy of existing RFID indoor positioning algorithm is easily affected by environment factors and the robustness is not strong,this paper proposed an RFID indoor positioning algorithm based on asynchronous advantage actor-critic (A3C).The main steps of the algorithm are as follows.Firstly,the RSSI value of RFID signal strength is used as the input value.The multi-thread sub-action network parallel interactive sampling learning,and the sub-evaluation network evaluates the advantage and disadvantage of the action value,so that the model is continuously optimized to find the best signal strength RSSI and trains the positioning model.The sub-thread network updates the network parameters to the global network on a regular basis,and the global network finally outputs the specific location of the reference tag,at the same time the asynchronous advantage actor-critic positioning model is trained.Secondly,in the online positioning stage,when the target to be tested enters the area to be tested,the signal strength RSSI value of the object to be tested is recorded and input into the asynchronous advantage actor-critic positioning model.The sub-thread network obtains the latest positioning information from the global network,locates the side target,and finally outputs the specific position of the target.RFID indoor positioning algorithm based on asynchronous advantage actor-critic was compared with the traditional RFID indoor positioning algorithm based on Support Vector Machines (SVM) positioning,Extreme Learning Machine (ELM) positioning,and Multi-Layer Perceptron positioning (MLP).Experiment results show that the mean positioning error of the proposed algorithm is respectively decreased by 66.114%,50.316% and 44.494%; the average positioning stability is respectively increased by 59.733%,53.083% and 43.748%.The experiment results show that the proposed RFID indoor positioning algorithm based on asynchronous advantage actor-critic has better positioning performance when dealing with a large number of indoor positioning targets.

Key words: Asynchronous advantage actor-critic, Indoor positioning, Reinforcement learning, RFID, RSSI


  • TP301.6
[1]SHI J Y,QIN X L,WANG L.Gradient and Constant-game Based RFID Indoor Localization Algorithm[J].ComputerScience,2015,42(11):138-143.
[2]ZHENG J,YANG Y,HE X,et al.Multiple-port reader antenna with three modes for UHF RFID applications[J].Electronics Letters,2018,54(5):264-266.
[3]LIU K,ZHANG W,ZHANG W D,et al.A Wireless Positioning Method Based on Deep Neural Network[J].Computer Engineering,2016,42(7):82-85.
[4]YANG Y N,XIA B,YUAN W,et al.Research on Ranging Algorithm Based on Convolution Neural Network[J].Journal of Chongqing University of Technology(Natural Science),2018(3):172-177.
[5]WANG C,WU F,SHI Z,et al.Indoor positioning technique by combining RFID and particle swarm optimization-based back propagation neural network[J].Optik - International Journal for Light and Electron Optics,2016,127(17):6839-6849.
[6]WANG C,SHI Z,WU F,et al.An RFID indoor positioning system by using Particle Swarm Optimization-based Artificial Neural Network[C]∥2016 International Conference on Audio.Language and Image Processing(ICALIP).IEEE Computer Society,2017:738-742.
[7]KUNG H Y,CHAISIT S,PHUONG N T M.Optimization of an RFID location identification scheme based on the neural network[J].International Journal of Communication Systems,2015,28(4):625-644.
[8]JIANG X,LIU J,CHEN Y,et al.Feature Adaptive Online Sequential Extreme Learning Machine for lifelong indoor localization[J].Neural Computing & Applications,2016,27(1):215-225.
[9]LIU F,ZHONG D.GSOS-ELM:An RFID-Based Indoor Localization System Using GSO Method and Semi-Supervised Online Sequential ELM[J].Sensors,2018,18(7):1995.
[10]GAO Z,MA Y,LIU K,et al.An Indoor Multi-tag Cooperative Localization Algorithm Based on NMDS for RFID[J].IEEE Sensors Journal,2017,17(7):2120-2128.
[11] ZHAO Y,LIU K,MA Y,et al.Similarity Analysis-Based Indoor Localization Algorithm With Backscatter Information of Passive UHF RFID Tags[J].IEEE Sensors Journal,2016,17(99):1-1.
[12]SUTTON R,BARTO A.Reinforcement Learning:An Introduction(second edition)[M].The MIT Press,2018.
[13]MURRAY D G,MURRAY D G.A computational model for TensorFlow:an introduction[C]∥Proceesings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Language.New York:ACM,2017:1-7.
[14]ABADI M.TensorFlow:learning functions at scale[J].Acm Sigplan Notices,2016,51(9):1-1.
[15]SCHMIDHUBER J.Deep learning in neural networks:An overview[J].Neural Network,2015,61(5):85-117.
[16]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[17]SONG R,LEWIS F,WEI Q,et al.Multiple actor-critic struc-tures for continuous-time optimal control using input-output data[J].IEEE Transactions on Neural Networks and Learning Systems,2015,26(4):851-865.
[18]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous Methods for Deep Reinforcement Learning[J].arXiv:1602.01783v2,2016.
[19]BURTON A,PARIKH T,MASCARENHAS S,et al.Driver identification and authentication with active behavior modeling[C]∥12th International Conference on Network and Service Management(CNSM).IEEE Computer Society,2017:388-393.
[20]ALARIFI A,ALSALMAN A M,ALSALEH M,et al.Ultra Wideband Indoor Positioning Technologies:Analysis and Recent Advances[J].IEEE Sensors,2016,16(5):1-36.
[21]ZHAI X,ALI A A S,AMIRA A,et al.MLP Neural Network Based Gas Classification System on Zynq SoC[J].IEEE Access,2017,4(99):8138-8146.
[1] 邵子灏, 杨世宇, 马国杰.
Foundation of Indoor Information Services:A Survey of Low-cost Localization Techniques
计算机科学, 2022, 49(9): 228-235. https://doi.org/10.11896/jsjkx.210900260
[2] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3] 唐清华, 王玫, 唐超尘, 刘鑫, 梁雯.
PDR Indoor Positioning Method Based on M2M Encounter Region
计算机科学, 2022, 49(9): 283-287. https://doi.org/10.11896/jsjkx.210800270
[4] 熊丽琴, 曹雷, 赖俊, 陈希亮.
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[6] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[7] 于滨, 李学华, 潘春雨, 李娜.
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[8] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[9] 周楚霖, 陈敬东, 黄凡.
WiFi-PDR Fusion Indoor Positioning Technology Based on Unscented Particle Filter
计算机科学, 2022, 49(6A): 606-611. https://doi.org/10.11896/jsjkx.210700108
[10] 谢万城, 李斌, 代玥玥.
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[11] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12] 郭雨欣, 陈秀宏.
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[13] 范静宇, 刘全.
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[14] 张佳能, 李辉, 吴昊霖, 王壮.
Exploration and Exploitation Balanced Experience Replay
计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[15] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
Full text



No Suggested Reading articles found!