计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 80-86.doi: 10.11896/jsjkx.191200109
所属专题: 高性能计算
金琪, 王俊昌, 付雄
JIN Qi, WANG Jun-chang, FU Xiong
摘要: 由于查询时间复杂度为O(1), Cuckoo哈希表在大数据、云计算等领域得到了广泛应用。然而, 现有Cuckoo哈希表的写入操作在遇到写冲突时普遍采用随机替换策略来替换已有表项。一方面, 写入操作容易出现高迟插入和无限循环, 尤其是当哈希表负载率较高时, 甚至有重构整个哈希表的风险;另一方面, 由于现有随机替换策略将数据项尽量散布在哈希表的各个桶中, 哈希表项间缺乏良好的空间局部性, 降低了数据正向查询的效率。为解决以上问题, 提出了一种基于智能放置策略的Cuckoo哈希表。具体地, 为提升写入操作的效率, 提出了一种基于负载均衡的Cuckoo哈希表(Load-Balance Cuckoo Hash Table, LBCHT), 实时限制每个桶的负载, 并使用广度优先搜索寻找最佳Cuckoo路径, 实验结果表明LBCHT能有效减少高负载率下写入操作可能出现的长尾效应;为提升查询操作的效率, 提出了一种充分利用局部性原理的Cuckoo哈希表(Locality Principle Cuckoo Hash Table, LPCHT), 通过充分发掘哈希表项间的空间局部性, 来有效减小查询操作引起的CPU高速缓存缺失率, 提高正向查询的效率。实验结果证明, 在高负载率的压力测试环境中, 与libcuckoo相比, LBCHT的写入效率提升了50%, LPCHT的正向查询效率提升了7%。
中图分类号:
[1] ZEPHORIA.The Top 20 Valuable Facebook Statistics[EB/OL].(2018-05-01) [2019-12-01].https://zephoria.com/top-15-valuable-facebook-statistics. [2] WU S, LI F, MEHROTRA S, et al.Query optimization for massively parallel data processing[C]∥Proceedings of the 2nd ACM Symposium on Cloud Computing.ACM, 2011:12. [3] PETERSON W.Addressing for random-access storage[J].IBM journal of Research and Development, 1957, 1(2):130-146. [4] WANG X, YU H, YIN Y L.Efficient collision search attacks onSHA-0[C]∥Annual International Cryptology Conference.Springer, Berlin, Heidelberg, 2005:1-16. [5] VITTER J S.Analysis of the search performance of coalescedhashing[J].Journal of the ACM (JACM), 1983, 30(2):231-258. [6] ZUKOWSKI M, HMAN S, BONCZ P.Architecture-conscious hashing[C]∥Proceedings of the 2nd International Workshop on Data Management on New Hardware.ACM, 2006:6. [7] Pagh R, Rodler F F.Cuckoo hashing[J].Journal of Algorithms, 2004, 51(2):122-144. [8] FAN B, ANDERSEN D G, KAMINSKY M.Memc3:Compactand concurrent memcache with dumber caching and smarter hashing[C]∥10th Symposium on Networked Systems Design and Implementation.2013:371-384. [9] SUN Y, HUA Y, FENG D, et al.A Collision-mitigation cuckoo hashing scheme for large-scale storage systems[J].IEEE Transactions on Parallel and Distributed Systems, 2016, 28(3):619-632. [10]KUSZMAUL W.Fast concurrent cuckoo kick-out evictionschemes for high-density tables[J].arXiv:1605.05236, 2016. [11]SAHNI K.CuckooQ Hashing[J].Rochester Institute of Technology, 2018, 8(5):1-5. [12]SUN Y, HUA Y, CHEN Z, et al.Mitigating asymmetric read and write costs in cuckoo hashing for storage systems[C]∥2019 Annual Technical Conference.2019:329-344. [13]SUN Y, HUA Y, JIANG S, et al.SmartCuckoo:A fast and cost-efficient hashing index scheme for cloud storage systems[C]∥2017 Annual Technical Conference.2017:553-565. [14]PAGH R, RODLER F F.Cuckoo hashing[J].Journal of Algorithms, 2004, 51(2):122-144. [15]Github.efficient/libcuckoo[EB/OL].(2019-12-1) [2019-10-2]. https://github.com/efficient/libcuckoo. [16]GitHub.sparsehash/sparsehash[EB/OL].(2019-07-18) [ 2019-10-18].https://github.com/sparsehash/sparsehash. [17]BRESLOW A D, ZHANG D P, GREATHOUSE J L, et al.Horton tables:Fast hash tables for in-memory data-intensive computing[C]∥2016 Annual Technical Conference.2016:281-294. [18]LIU Y, ZHANG K, SPEAR M.Dynamic-sized nonblocking hash tables[C]∥Proceedings of the 2014 ACM symposium on Principles of distributed computing.ACM, 2014:242-251. [19]TRIPLETT J, MCKENNEY P E, WALPOLE J.Resizable, Sca-lable, Concurrent Hash Tables via Relativistic Programming[C]∥USENIX Annual Technical Conference.2011:1-14. [20]MCKENNEY P E, APPAVOO J, KLEEN A, et al.Read-copyupdate[C]∥AUUG Conference Proceedings.2001:175. [21]LI X, ANDERSEN D G, KAMINSKY M, et al.Algorithmic improvements for fast concurrent cuckoo hashing[C]∥Procee-dings of the Ninth European Conference on Computer Systems.ACM, 2014:27. [22]LIM H, FAN B, ANDERSEN D G, et al.SILT:A memory-efficient, high-performance key-value store[C]∥Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles.ACM, 2011:1-13. [23]GitHub.silt/silt[EB/OL].(2017-9-7) [2020-3-1].https://github.com/silt/silt. |
[1] | 田真真, 蒋维, 郑炳旭, 孟利民. 基于服务器集群的负载均衡优化调度算法 Load Balancing Optimization Scheduling Algorithm Based on Server Cluster 计算机科学, 2022, 49(6A): 639-644. https://doi.org/10.11896/jsjkx.210800071 |
[2] | 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨. 基于国产众核处理器的深度神经网络算子加速库优化 Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor 计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226 |
[3] | 谭双杰, 林宝军, 刘迎春, 赵帅. 基于机器学习的分布式星载RTs系统负载调度算法 Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning 计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126 |
[4] | 夏中, 向敏, 黄春梅. 基于CHBL的P2P视频监控网络分层管理机制 Hierarchical Management Mechanism of P2P Video Surveillance Network Based on CHBL 计算机科学, 2021, 48(9): 278-285. https://doi.org/10.11896/jsjkx.201200056 |
[5] | 宋海宁, 焦健, 刘永. 高速公路中的移动边缘计算研究 Research on Mobile Edge Computing in Expressway 计算机科学, 2021, 48(6A): 383-386. https://doi.org/10.11896/jsjkx.200900212 |
[6] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[7] | 郑增乾, 王锟, 赵涛, 蒋维, 孟利民. 带宽和时延受限的流媒体服务器集群负载均衡机制 Load Balancing Mechanism for Bandwidth and Time-delay Constrained Streaming Media Server Cluster 计算机科学, 2021, 48(6): 261-267. https://doi.org/10.11896/jsjkx.200400131 |
[8] | 姚泽玮, 林嘉雯, 胡俊钦, 陈星. 基于PSO-GA的多边缘负载均衡方法 PSO-GA Based Approach to Multi-edge Load Balancing 计算机科学, 2021, 48(11A): 456-463. https://doi.org/10.11896/jsjkx.210100191 |
[9] | 杨紫淇, 蔡英, 张皓晨, 范艳芳. 基于负载均衡的VEC服务器联合计算任务卸载方案 Computational Task Offloading Scheme Based on Load Balance for Cooperative VEC Servers 计算机科学, 2021, 48(1): 81-88. https://doi.org/10.11896/jsjkx.200800220 |
[10] | 郭飞雁, 唐兵. 基于用户延迟感知的移动边缘服务器放置方法 Mobile Edge Server Placement Method Based on User Latency-aware 计算机科学, 2021, 48(1): 103-110. https://doi.org/10.11896/jsjkx.200900146 |
[11] | 王国澎, 杨剑新, 尹飞, 蒋生健. 负载均衡的处理器运算资源分配方法 Computing Resources Allocation with Load Balance in Modern Processor 计算机科学, 2020, 47(8): 41-48. https://doi.org/10.11896/jsjkx.191000148 |
[12] | 高子妍, 王勇. 面向云服务的分布式消息系统负载均衡策略 Load Balancing Strategy of Distributed Messaging System for Cloud Services 计算机科学, 2020, 47(6A): 318-324. https://doi.org/10.11896/JsJkx.191100012 |
[13] | 黄梅根, 汪涛, 刘亮, 庞瑞琴, 杜欢. 基于软件定义网络资源优化的虚拟网络功能部署策略 Virtual Network Function Deployment Strategy Based on Software Defined Network Resource Optimization 计算机科学, 2020, 47(6A): 404-408. https://doi.org/10.11896/JsJkx.191000116 |
[14] | 周建新, 张志鹏, 周宁. 基于CKSP的分段路由负载均衡技术 Load Balancing Technology of Segment Routing Based on CKSP 计算机科学, 2020, 47(4): 256-261. https://doi.org/10.11896/jsjkx.190500122 |
[15] | 朱岸青, 李帅, 唐晓东. Spark平台中的并行化FP_growth关联规则挖掘方法 Parallel FP_growth Association Rules Mining Method on Spark Platform 计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110 |
|