  1. (广西大学计算机与电子信息学院 南宁530004)1;
    (广西多媒体通信与网络技术重点实验室 南宁530004)2
  • 收稿日期:2019-01-10 出版日期:2020-02-15 发布日期:2020-03-18
  • 通讯作者: 郑嘉利(zjl@gxu.edu.cn)
RFID Indoor Positioning Algorithm Based on Asynchronous Advantage Actor-Critic

LI Li,ZHENG Jia-li,WANG Zhe,YUAN Yuan,SHI Jing   

  1. (School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China)1;
    (Guangxi Key Laboratory of Multimedia Communications and Network Technology,Nanning 530004,China)2
  • Received:2019-01-10 Online:2020-02-15 Published:2020-03-18
  • About author:LI Li,born in 1994,postgraduate.Her main research interests include information processing and communication networks,reinforcement learning and internet of things;ZHENG Jia-li,born in 1979,professor.His main research interests include internet of things,RFID and artificial intelligence.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61761004).

摘要: 针对现有的RFID室内定位算法的精度容易受到环境因素影响的问题,提出了一种基于异步优势动作评价(Asynchronous Advantage Actor-critic,A3C)的RFID室内定位算法。该算法的主要步骤为:1)将RFID的信号强度RSSI值作为输入值,多个线程子动作网络并行交互采样学习,利用子评价网络评价动作值的优劣,使模型不断优化,找到最优信号强度RSSI值,并训练定位模型;子线程网络定期将网络参数异步更新到全局网络上,全局网络最后输出参考标签的具体位置,同时训练得到异步优势动作评价定位模型。2)在线定位阶段,当待测目标进入待测区域时,记录待测目标的信号强度RSSI值,将其输入异步优势动作评价定位模型中,子线程网络从全局网络中获取最新定位信息,对待测目标进行定位,最后输出目标的具体位置。实验数据表明,基于异步优势动作评价的RFID室内定位算法与传统的基于向量机(Support Vector Machines,SVM)定位、基于极限学习机(Extreme Learning Machine,ELM)定位、基于多层神经网络定位(Multi-Layer Perceptron,MLP)的RFID室内定位算法相比,定位平均误差分别下降了66.114%,50.316%,44.494%;定位稳定性分别平均提高了59.733%,53.083%,43.748%。实验结果表明,基于异步优势动作评价的RFID室内定位算法在处理大量室内定位目标时具有较好的定位性能。

关键词: RFID, RSSI, 强化学习, 室内定位, 异步优势动作评价

Abstract: In view of the fact that the accuracy of existing RFID indoor positioning algorithm is easily affected by environment factors and the robustness is not strong,this paper proposed an RFID indoor positioning algorithm based on asynchronous advantage actor-critic (A3C).The main steps of the algorithm are as follows.Firstly,the RSSI value of RFID signal strength is used as the input value.The multi-thread sub-action network parallel interactive sampling learning,and the sub-evaluation network evaluates the advantage and disadvantage of the action value,so that the model is continuously optimized to find the best signal strength RSSI and trains the positioning model.The sub-thread network updates the network parameters to the global network on a regular basis,and the global network finally outputs the specific location of the reference tag,at the same time the asynchronous advantage actor-critic positioning model is trained.Secondly,in the online positioning stage,when the target to be tested enters the area to be tested,the signal strength RSSI value of the object to be tested is recorded and input into the asynchronous advantage actor-critic positioning model.The sub-thread network obtains the latest positioning information from the global network,locates the side target,and finally outputs the specific position of the target.RFID indoor positioning algorithm based on asynchronous advantage actor-critic was compared with the traditional RFID indoor positioning algorithm based on Support Vector Machines (SVM) positioning,Extreme Learning Machine (ELM) positioning,and Multi-Layer Perceptron positioning (MLP).Experiment results show that the mean positioning error of the proposed algorithm is respectively decreased by 66.114%,50.316% and 44.494%; the average positioning stability is respectively increased by 59.733%,53.083% and 43.748%.The experiment results show that the proposed RFID indoor positioning algorithm based on asynchronous advantage actor-critic has better positioning performance when dealing with a large number of indoor positioning targets.

Key words: Asynchronous advantage actor-critic, Indoor positioning, Reinforcement learning, RFID, RSSI


