计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 41-47.doi: 10.11896/jsjkx.200700070

所属专题: 群智感知计算

• 群智感知计算 • 上一篇    下一篇

移动群智感知中基于强化学习的双赢博弈

蔡威, 白光伟, 沈航, 成昭炜, 张慧丽   

  1. 南京工业大学计算机科学与技术学院 南京211816
  • 收稿日期:2020-07-12 修回日期:2020-08-01 出版日期:2020-10-15 发布日期:2020-10-16
  • 通讯作者: 沈航(hshen@njtech.edu.cn)
  • 作者简介:caiwei913243@163.com
  • 基金资助:
    国家自然科学基金(61502230);江苏省自然科学基金(BK20150960);江苏省“六大人才高峰”高层次人才资助项目(RJFW-020);计算机软件新技术国家重点实验室(南京大学)资助项目(KFKT2017B21)

Reinforcement Learning Based Win-Win Game for Mobile Crowdsensing

CAI Wei, BAI Guang-wei, SHEN Hang, CHENG Zhao-wei, ZHANG Hui-li   

  1. College of Computer Science and Technology,Nanjing Tech University,Nanjing 211816,China
  • Received:2020-07-12 Revised:2020-08-01 Online:2020-10-15 Published:2020-10-16
  • About author:CAI Wei,born in 1997,postgraduate.His main research interests include privacy protection,mobile crowdsensing and reinforcement learning.
    SHEN Hang,born in 1984,Ph.D,asso-ciate professor,master supervisor,is a member of China Computer Federation.His main research interests include cyber security,privacy protection and 5G network.
  • Supported by:
    National Natural Science Foundation of China (61502230),Natural Science Foundation of Jiangsu Province (BK20150960),Jiangsu Province “Six Talent Peaks” High-level Talent Project (RJFW-020) and State Key Laboratory of New Technology of Computer Software (Nanjing University) Project (KFKT2017B21)

摘要: 移动群智感知系统需要为用户提供个性化隐私保护,以吸引更多用户参与任务。然而,由于恶意攻击者的存在,用户提升隐私保护力度会导致位置可用性变差,降低任务分配效率。针对该问题,提出了一种基于强化学习的用户与平台共赢的博弈机制。该机制首先通过可信第三方的两个虚拟实体分别模拟用户并与平台进行交互,一个模拟用户选择隐私预算为位置数据添加噪声,另一个模拟平台根据用户的扰动位置分配任务;然后,将交互过程构建为博弈,并推导出均衡点,其中交互的两个虚拟实体就是博弈双方;最后,使用强化学习方法不断尝试不同的位置扰动策略,输出一个最优的位置扰动方案。实验结果表明,该机制能在优化任务分配效用的同时,尽可能地提高用户的整体效用,使用户与平台达成双赢。

关键词: 博弈论, 个性化隐私保护, 强化学习, 任务分配, 移动群智感知

Abstract: Mobile crowdsensing system should offer the personalized privacy protection of users’ location to attract more users to participate in the task.However,due to the existence of malicious attackers,users’ enhanced privacy protection will lead to poor location availability and reduce the efficiency of task allocation.To solve this problem,this paper proposes a win-win game based on reinforcement learning.Firstly,two virtual entities of the trusted third party are used to simulate the interaction between users and the platform,one simulating user chooses the privacy budget to add noise to their locations and the other simulates the platform allocating tasks with users’ disturbed locations.Then,the interaction process is constructed as a game,in which the two virtual entities of interaction are the adversaries,and the equilibrium point is derived.Finally,the reinforcement learning method is used to try different location disturbance strategies and output an optimal location disturbance scheme.The experimental results show that the mechanism can optimize the task distribution utility while improving the user’s overall utility as much as possible,so that the user and the platform can achieve a win-win situation.

Key words: Game theory, Mobile crowdsensing, Personalized privacy-preserving, Reinforcement learning, Task allocation

中图分类号: 

  • TP393
[1]WANG L Y,ZHANG D Q,WANG Y S,et al.Sparse MobileCrowdsensing:Challenges and Opportunities[J].IEEE Communications Magazine,2016,54(7):161-167.
[2]TANG Y,LIU R Q,YANG P L,et al.A Secure Task Allocation Technology Based on Crowd Sensing Network [J].Computer Engineering,2016,42(6):161-166.
[3]GUO B,LIU Y,WU W L,et al.ActiveCrowd:A Framework for Optimized Multitask Allocation in Mobile Crowdsensing Systems[J].IEEE Transactions on Human-Machine Systems,2017,47(3):392-403.
[4]LIU Y,GUO B,WANG Y,et al.TaskMe:Multi-Task Allocation in Mobile Crowd Sensing [C]//Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing.2016:403-414.
[5]WANG L Y,ZHANG D Q,PATHAK A,et al.CCS-TA:Quality-Guaranteed Online Task Allocation in Compressive Crowdsensing[C]//Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing.2015:683-694.
[6]QIAN Y F,JIANG Y Y,HOSSAIN M S,et al.Privacy-Preserving based Task Allocation with Mobile Edge Clouds[J].Information Sciences,2020,507:288-297.
[7]LIU B,ZHOU W L,ZHU T Q,et al.Invisible Hand:A Privacy Preserving Mobile Crowd Sensing Framework Based on Economic Models[J].IEEE Transactions on Vehicular Technology,2016,66(5):4410-4423.
[8]TO H,GHINITA G,SHAHABI C.A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing[J].Proceedings of the VLDB Endowment,2014,7(10):919-930.
[9]POURNAJAF L,XIONG L,SUNDERAM V,et al.Spatial Task Assignment for Crowd Sensing with Cloaked Locations[C]//2014 IEEE 15th International Conference on Mobile Data Ma-nagement.IEEE,2014,1:73-82.
[10]WANG T C,LIU Y,JIN X,et al.Research on K-Anonymity-Based Privacy Protection in Crowd Sensing[J].Journal on Communications,2018,39(A01):170-178.
[11]LONG H,ZHANG S K,ZHANG L.Privacy Protection Method Based on Voronoi Cell in Crowd Sensing[J].Computer Engineering,2020,46(5):181-186,192.
[12]DWORK C.Differential Privacy:A Survey of Results[C]//International Conference on Theory and Applications of Models of Computation.Springer,Berlin,Heidelberg,2008:1-19.
[13]XIONG J B,MA R,CHEN L,et al.A Personalized Privacy Protection Framework for Mobile Crowdsensing in IIoT[J].IEEE Transactions on Industrial Informatics,2020,16(6):4231-4241.
[14]WANG L Y,YANG D Q,HAN X,et al.Location Privacy-Preserving Task Allocation for Mobile Crowdsensing with Differential Geo-Obfuscation[C]//Proceedings of the 26th International Conference on World Wide Web.2017:627-636.
[15]WANG Z B,HU J H,LV R Z,et al.Personalized Privacy-Preserving Task Allocation for Mobile Crowdsensing[J].IEEE Transactions on Mobile Computing,2019,18(6):1330-1341.
[16]NIE J T,LUO J,XIONG Z H,et al.A Stackelberg Game Approach Toward Socially-Aware Incentive Mechanisms for Mobile Crowdsensing[J].IEEE Transactions on Wireless Communications,2019,18(1):724-738.
[17]XIAO L,CHEN T H,XIE C X,et al.Mobile Crowdsensing Games in Vehicular Networks[J].IEEE Transactions on Vehi-cular Technology,2017,67(2):1535-1545.
[18]ALSHEIKH M A,NIYATO D,LEONG D,et al.Privacy Mana-gement and Optimal Pricing in People-Centric Sensing[J].IEEE Journal on Selected Areas in Communications,2017,35(4):906-920.
[19]CHATZIKOKOLAKIS K,ANDRÉS M E,BORDENABE N E,et al.Broadening the Scope of Differential Privacy Using Metrics[C]//International Symposium on Privacy Enhancing Technologies Symposium.Springer,Berlin,Heidelberg,2013:82-102.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3] 姜洋洋, 宋丽华, 邢长友, 张国敏, 曾庆伟.
蜜罐博弈中信念驱动的攻防策略优化机制
Belief Driven Attack and Defense Policy Optimization Mechanism in Honeypot Game
计算机科学, 2022, 49(9): 333-339. https://doi.org/10.11896/jsjkx.220400011
[4] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[5] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[6] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[7] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[8] 方韬, 杨旸, 陈佳馨.
D2D辅助移动边缘计算下的卸载策略优化
Optimization of Offloading Decisions in D2D-assisted MEC Networks
计算机科学, 2022, 49(6A): 601-605. https://doi.org/10.11896/jsjkx.210200114
[9] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[10] 胥昊, 曹桂均, 闫璐, 李科, 王振宏.
面向铁路集装箱的高可靠低时延无线资源分配算法
Wireless Resource Allocation Algorithm with High Reliability and Low Delay for Railway Container
计算机科学, 2022, 49(6): 39-43. https://doi.org/10.11896/jsjkx.211200143
[11] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[12] 郭雨欣, 陈秀宏.
融合BERT词嵌入表示和主题信息增强的自动摘要模型
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[13] 范静宇, 刘全.
基于随机加权三重Q学习的异策略最大熵强化学习算法
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[14] 张佳能, 李辉, 吴昊霖, 王壮.
一种平衡探索和利用的优先经验回放方法
Exploration and Exploitation Balanced Experience Replay
计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[15] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!