计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 80-86.doi: 10.11896/jsjkx.191200109

所属专题: 高性能计算

• 高性能计算 • 上一篇    下一篇

基于智能放置策略的Cuckoo哈希表

金琪, 王俊昌, 付雄   

  1. 南京邮电大学计算机学院 南京 210003
  • 出版日期:2020-08-15 发布日期:2020-08-10
  • 通讯作者: 王俊昌(wangjc@njupt.edu.cn)
  • 作者简介:524983744@qq.com
  • 基金资助:
    国家自然科学基金(61602264);江苏省重点研发计划(BE2017743)

Cuckoo Hash Table Based on Smart Placement Strategy

JIN Qi, WANG Jun-chang, FU Xiong   

  1. School of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:JIN Qi, born in 1995, postgraduate.His main research interests include high performance computing and data sto-rage.
    WANG Jun-chang, born in 1982, Ph.D, lecturer.His main research interests include parallel and distributed computing systems.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61602264) and Key Research & Development Plan of Jiangsu Province (BE2017743).

摘要: 由于查询时间复杂度为O(1), Cuckoo哈希表在大数据、云计算等领域得到了广泛应用。然而, 现有Cuckoo哈希表的写入操作在遇到写冲突时普遍采用随机替换策略来替换已有表项。一方面, 写入操作容易出现高迟插入和无限循环, 尤其是当哈希表负载率较高时, 甚至有重构整个哈希表的风险;另一方面, 由于现有随机替换策略将数据项尽量散布在哈希表的各个桶中, 哈希表项间缺乏良好的空间局部性, 降低了数据正向查询的效率。为解决以上问题, 提出了一种基于智能放置策略的Cuckoo哈希表。具体地, 为提升写入操作的效率, 提出了一种基于负载均衡的Cuckoo哈希表(Load-Balance Cuckoo Hash Table, LBCHT), 实时限制每个桶的负载, 并使用广度优先搜索寻找最佳Cuckoo路径, 实验结果表明LBCHT能有效减少高负载率下写入操作可能出现的长尾效应;为提升查询操作的效率, 提出了一种充分利用局部性原理的Cuckoo哈希表(Locality Principle Cuckoo Hash Table, LPCHT), 通过充分发掘哈希表项间的空间局部性, 来有效减小查询操作引起的CPU高速缓存缺失率, 提高正向查询的效率。实验结果证明, 在高负载率的压力测试环境中, 与libcuckoo相比, LBCHT的写入效率提升了50%, LPCHT的正向查询效率提升了7%。

关键词: 长尾效应, 负载均衡, 广度优先搜索, 局部性原理, 正向查询

Abstract: Because time overhead of query is O (1), Cuckoo hash table has been widely used in big data, cloud computing and other fields.However, insertion of the existing Cuckoo hash tables generally uses random replacement strategy to replace existing entries when encountering conflicts.On the one hand, writers are prone to high-latency insertion and infinite loops, especially when the load rate of hash table is relatively high, and even have the risk of reconstructing entire hash table;on the other hand, due to existing random replacement strategies, entries are scattered as much as possible in the buckets of hash table.The lack of good spatial locality among entries reduces the efficiency of positive query.To solve the above problems, this paper proposes a Cuckoo hash table based on smart placement strategy.Specifically, in order to improve efficiency of insertion, a load-balance Cuc-koo hash table (LBCHT) is proposed, which limits the load of each bucket in real time, using breadth-first search to find the best Cuckoo path to ensures load balancing among buckets.Experimental results show that LBCHT can effectively reduce long tail effect of insertion under high load rate.And in order to improve efficiency of lookup, a Cuckoo hash table that makes full use of the locality principle is proposed (LPCHT).By fully exploring the spatial Locality among entries, CPU cache miss rate caused by lookup is reduced and efficiency of positive query is improved.Experiments show that in a high load rate stress test environment, compared with libcuckoo, LBCHT throughput of insertion can be increased by 50%, and LPCHT improves positive query efficiency by 7%.

Key words: Breadth first search, Load balance, Locality principle, Long tail effect, Positive query

中图分类号: 

  • TN393
[1] ZEPHORIA.The Top 20 Valuable Facebook Statistics[EB/OL].(2018-05-01) [2019-12-01].https://zephoria.com/top-15-valuable-facebook-statistics.
[2] WU S, LI F, MEHROTRA S, et al.Query optimization for massively parallel data processing[C]∥Proceedings of the 2nd ACM Symposium on Cloud Computing.ACM, 2011:12.
[3] PETERSON W.Addressing for random-access storage[J].IBM journal of Research and Development, 1957, 1(2):130-146.
[4] WANG X, YU H, YIN Y L.Efficient collision search attacks onSHA-0[C]∥Annual International Cryptology Conference.Springer, Berlin, Heidelberg, 2005:1-16.
[5] VITTER J S.Analysis of the search performance of coalescedhashing[J].Journal of the ACM (JACM), 1983, 30(2):231-258.
[6] ZUKOWSKI M, HMAN S, BONCZ P.Architecture-conscious hashing[C]∥Proceedings of the 2nd International Workshop on Data Management on New Hardware.ACM, 2006:6.
[7] Pagh R, Rodler F F.Cuckoo hashing[J].Journal of Algorithms, 2004, 51(2):122-144.
[8] FAN B, ANDERSEN D G, KAMINSKY M.Memc3:Compactand concurrent memcache with dumber caching and smarter hashing[C]∥10th Symposium on Networked Systems Design and Implementation.2013:371-384.
[9] SUN Y, HUA Y, FENG D, et al.A Collision-mitigation cuckoo hashing scheme for large-scale storage systems[J].IEEE Transactions on Parallel and Distributed Systems, 2016, 28(3):619-632.
[10]KUSZMAUL W.Fast concurrent cuckoo kick-out evictionschemes for high-density tables[J].arXiv:1605.05236, 2016.
[11]SAHNI K.CuckooQ Hashing[J].Rochester Institute of Technology, 2018, 8(5):1-5.
[12]SUN Y, HUA Y, CHEN Z, et al.Mitigating asymmetric read and write costs in cuckoo hashing for storage systems[C]∥2019 Annual Technical Conference.2019:329-344.
[13]SUN Y, HUA Y, JIANG S, et al.SmartCuckoo:A fast and cost-efficient hashing index scheme for cloud storage systems[C]∥2017 Annual Technical Conference.2017:553-565.
[14]PAGH R, RODLER F F.Cuckoo hashing[J].Journal of Algorithms, 2004, 51(2):122-144.
[15]Github.efficient/libcuckoo[EB/OL].(2019-12-1) [2019-10-2].
https://github.com/efficient/libcuckoo.
[16]GitHub.sparsehash/sparsehash[EB/OL].(2019-07-18) [ 2019-10-18].https://github.com/sparsehash/sparsehash.
[17]BRESLOW A D, ZHANG D P, GREATHOUSE J L, et al.Horton tables:Fast hash tables for in-memory data-intensive computing[C]∥2016 Annual Technical Conference.2016:281-294.
[18]LIU Y, ZHANG K, SPEAR M.Dynamic-sized nonblocking hash tables[C]∥Proceedings of the 2014 ACM symposium on Principles of distributed computing.ACM, 2014:242-251.
[19]TRIPLETT J, MCKENNEY P E, WALPOLE J.Resizable, Sca-lable, Concurrent Hash Tables via Relativistic Programming[C]∥USENIX Annual Technical Conference.2011:1-14.
[20]MCKENNEY P E, APPAVOO J, KLEEN A, et al.Read-copyupdate[C]∥AUUG Conference Proceedings.2001:175.
[21]LI X, ANDERSEN D G, KAMINSKY M, et al.Algorithmic improvements for fast concurrent cuckoo hashing[C]∥Procee-dings of the Ninth European Conference on Computer Systems.ACM, 2014:27.
[22]LIM H, FAN B, ANDERSEN D G, et al.SILT:A memory-efficient, high-performance key-value store[C]∥Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles.ACM, 2011:1-13.
[23]GitHub.silt/silt[EB/OL].(2017-9-7) [2020-3-1].https://github.com/silt/silt.
[1] 田真真, 蒋维, 郑炳旭, 孟利民.
基于服务器集群的负载均衡优化调度算法
Load Balancing Optimization Scheduling Algorithm Based on Server Cluster
计算机科学, 2022, 49(6A): 639-644. https://doi.org/10.11896/jsjkx.210800071
[2] 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨.
基于国产众核处理器的深度神经网络算子加速库优化
Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor
计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226
[3] 谭双杰, 林宝军, 刘迎春, 赵帅.
基于机器学习的分布式星载RTs系统负载调度算法
Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning
计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126
[4] 夏中, 向敏, 黄春梅.
基于CHBL的P2P视频监控网络分层管理机制
Hierarchical Management Mechanism of P2P Video Surveillance Network Based on CHBL
计算机科学, 2021, 48(9): 278-285. https://doi.org/10.11896/jsjkx.201200056
[5] 宋海宁, 焦健, 刘永.
高速公路中的移动边缘计算研究
Research on Mobile Edge Computing in Expressway
计算机科学, 2021, 48(6A): 383-386. https://doi.org/10.11896/jsjkx.200900212
[6] 王政, 姜春茂.
一种基于三支决策的云任务调度优化算法
Cloud Task Scheduling Algorithm Based on Three-way Decisions
计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023
[7] 郑增乾, 王锟, 赵涛, 蒋维, 孟利民.
带宽和时延受限的流媒体服务器集群负载均衡机制
Load Balancing Mechanism for Bandwidth and Time-delay Constrained Streaming Media Server Cluster
计算机科学, 2021, 48(6): 261-267. https://doi.org/10.11896/jsjkx.200400131
[8] 姚泽玮, 林嘉雯, 胡俊钦, 陈星.
基于PSO-GA的多边缘负载均衡方法
PSO-GA Based Approach to Multi-edge Load Balancing
计算机科学, 2021, 48(11A): 456-463. https://doi.org/10.11896/jsjkx.210100191
[9] 杨紫淇, 蔡英, 张皓晨, 范艳芳.
基于负载均衡的VEC服务器联合计算任务卸载方案
Computational Task Offloading Scheme Based on Load Balance for Cooperative VEC Servers
计算机科学, 2021, 48(1): 81-88. https://doi.org/10.11896/jsjkx.200800220
[10] 郭飞雁, 唐兵.
基于用户延迟感知的移动边缘服务器放置方法
Mobile Edge Server Placement Method Based on User Latency-aware
计算机科学, 2021, 48(1): 103-110. https://doi.org/10.11896/jsjkx.200900146
[11] 王国澎, 杨剑新, 尹飞, 蒋生健.
负载均衡的处理器运算资源分配方法
Computing Resources Allocation with Load Balance in Modern Processor
计算机科学, 2020, 47(8): 41-48. https://doi.org/10.11896/jsjkx.191000148
[12] 高子妍, 王勇.
面向云服务的分布式消息系统负载均衡策略
Load Balancing Strategy of Distributed Messaging System for Cloud Services
计算机科学, 2020, 47(6A): 318-324. https://doi.org/10.11896/JsJkx.191100012
[13] 黄梅根, 汪涛, 刘亮, 庞瑞琴, 杜欢.
基于软件定义网络资源优化的虚拟网络功能部署策略
Virtual Network Function Deployment Strategy Based on Software Defined Network Resource Optimization
计算机科学, 2020, 47(6A): 404-408. https://doi.org/10.11896/JsJkx.191000116
[14] 周建新, 张志鹏, 周宁.
基于CKSP的分段路由负载均衡技术
Load Balancing Technology of Segment Routing Based on CKSP
计算机科学, 2020, 47(4): 256-261. https://doi.org/10.11896/jsjkx.190500122
[15] 朱岸青, 李帅, 唐晓东.
Spark平台中的并行化FP_growth关联规则挖掘方法
Parallel FP_growth Association Rules Mining Method on Spark Platform
计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!