计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 213-219.doi: 10.11896/jsjkx.181102097

• 软件与数据库技术 • 上一篇    下一篇

基于操作历史图的分布式Key-Value数据库一致性检测算法

廖彬1, 张陶2,3, 李敏1, 于炯2, 国冰磊2, 刘炎4   

  1. (新疆财经大学统计与数据科学学院 乌鲁木齐830012)1;
    (新疆大学信息科学与工程学院 乌鲁木齐830046)2;
    (新疆医科大学医学工程技术学院 乌鲁木齐830011)3 ;
    (清华大学软件学院 北京100084)4
  • 收稿日期:2018-11-14 出版日期:2019-12-15 发布日期:2019-12-17
  • 通讯作者: 廖彬(1986-),男,博士,副教授,硕士生导师,主要研究方向为绿色计算、数据挖掘及大数据计算模型等,E-mail:liaobin665@163.com。
  • 作者简介:张陶(1988-),女,博士生,主要研究方向为分布式计算、网格计算;李敏(1990-),女,硕士,讲师,主要研究方向为大数据计算;于炯(1964-),男,博士,教授,博士生导师,主要研究方向为网络安全、网格与分布式计算;国冰磊(1991-),女,博士生,主要研究方向为绿色计算、数据库系统等;刘炎(1990-),男,硕士生,主要研究方向为大数据计算。
  • 基金资助:
    本文受国家自然科学基金项目(61562078,61462079,61862060),新疆天山青年计划项目(2018Q073)资助。

Consistency Checking Algorithm for Distributed Key-Value Database Based on Operation History Graph

LIAO Bin1, ZHANG Tao2,3, LI Min1, YU Jiong2, GUO Bing-lei2, LIU Yan4   

  1. (College of Statistics and Data Science,Xinjiang University of Finance and Economics,Urumqi 830012,China)1;
    (School of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)2;
    (Department of Medical Engineering and Technology,Xinjiang Medical University,Urumqi 830011,China)3;
    (School of Software,Tsinghua University,Beijing 100084,China)4
  • Received:2018-11-14 Online:2019-12-15 Published:2019-12-17

摘要: 分布式数据库系统的副本机制在提高系统可靠性及性能的同时,导致了多副本数据管理的一致性问题;数据一致性的实现需要一致性协议模型来进行预防,也需要一致性检测算法对非一致数据进行检测。首先,对读写操作记录之间的时序关系、安全一致性及并行一致性原则等概念进行定义;其次,根据操作记录集合中读写操作之间的并行与时序关系,提取出操作记录集合向操作记录图转化的规则,并在此基础上设计了操作记录向历史记录图的转化算法;然后,以历史记录图为输入,设计了违反一致性查找算法,查找并返回图中所有违反安全与并行一致性读操作的集合;最后,基于Cassandra进行实验并将读写一致性设置为ONE,通过YCSB产生并行读写压力测试,与同类算法的对比实验验证了所提算法在功能与效率两方面的优越性。

关键词: DAG图, Key-Value数据库, 分布式数据库, 一致性检测, 一致性原则

Abstract: The replica mechanism of distributed database system not only improves reliability and performance of the overall system,but also leads to the consistency problem of multi-replica data management mechanism.To keep the consistency of data,a consistency protocol model is needed to avoid data’s inconsistency events.Moreover,consistency checking algorithms are also needed to detect inconsistent data.Firstly,the concepts of temporal relations,security consistency,and concurrent consistency between read and write operations are defined.Secondly,according to the parallel and temporal relationship between read and write operations that recorded in the set of operations,the rules of transforming operation record set to operation record graph are extracted,and then the algorithm of transforming operation records into operation record graph is also designed.Then,taking the set of operation record graph as input,a violation operation search algorithm is designed to find the set of inconsistent read operations which have violated security and parallel consistency.Finally,experiments are conducted based on Cassandra and the read-write consistency is set to ONE.YCSB generates parallel read-write stress tests.The comparative experiments with similar algorithms verify the advantages of the proposed algorithm in both function and efficiency.

Key words: Consistency check, Consistency principle, DAG diagram, Distributed database, Key-Value database

中图分类号: 

  • TP391
[1]LI J Z,WANG H Z,GAO H.State-of-the-Art of Research on Big Data Usability[J].Journal of Software,2016,27(7):1605-1625.
[2]GILBERT S,LYNCH N.Perspectives on the CAP Theorem [J].Computer,2011,45(2):30-36.
[3]LAMPORT L.Fast Paxos[J].Distributed Computing,2006,19(2):79-103.
[4]BENZ S,SOUSA L P D,PEDONE F.Stretching multi-ring Pa- xos[C]//ACM Symposium on Applied Computing.New York:ACM,2016:492-499.
[5]OKI B M,LISKOV B H.Viewstamped Replication:A New Primary Copy Method to Support Highly-Available Distributed Systems[C]//ACM Symposium on Principles of Distributed Computing.New York:ACM,1988:8-17.
[6]EL-SANOSI I,EZHILCHELVAN P.Improving ZooKeeper Atomic Broadcast Performance by Coin Tossing[C]//European Workshop on Performance Engineering.Springer,2017:249-265.
[7]SONG M,LUO G,HAIHONG E.A Service Discovery System based on Zookeeper with Priority Load Balance Strategy[C]//IEEE International Conference on Network Infrastructure and Digital Content.Piscataway,NJ:IEEE,2017:117-119.
[8]JUNQUEIRA F P,REED B C.The life and times of a zookeeper[C]//ACM Symposium on Principles of Distributed Computing.New York:ACM,2009:1-4.
[9]BECKER M Y,SEWELL P.Cassandra:flexible trust management,applied to electronic health records[C]//ComputerSecu-rity Foundations Workshop.Piscataway:IEEE,2004:139-154.
[10]MALIK P,MALIK P.Cassandra:a decentralized structured storage system[C]//ACM SIGOPS Operating Systems Review.New York:ACM,2010:35-40.
[11]WANG B Q,YU Q,LIU X,et al.Efficient and dynamic data management system for cassandra database[J].ComputerScien-ce,2016,43(7):197-202.
[12]VORA M N.Hadoop-HBase for Large-Scale Data[C]//International Conference on Computer Science and Network Technology.Piscataway,NJ:IEEE,2012:601-605.
[13]FENG S C,CAO B,CHAO D W,et al.Hash Synopsis Forest Index Schema Based on Hbase[J].Journal of Chinese Computer Systems,2018,39(1):100-104.
[14]BOYD S,GHOSH A,PRABHAKAR B,et al.Randomized gossip algorithms[J].IEEE Transactions on Information Theory,2006,52(6):2508-2530.
[15]HAAS Z,HALPERN J Y,LI L.Gossip-based ad hoc routing [J].IEEE/ACM Transactions on Networking,2006,14(3):479-491.
[16]BOHANNON P,FAN W,GEERTS F,et al.Conditional Functional Dependencies for Data Cleaning[C]//IEEE,International Conference on Data Engineering.Piscataway,NJ:IEEE,2007:746-755.
[17]FAN W,GEERTS F,JIA X.Improving data quality:consistency and accuracy[J].Pakistan Journal of Biological Sciences,2007,7(6):315-326.
[18]ZHANG A Z,MEN X Y,WANG H Z,et al.Hadoop-Based Inconsistence Detection and Reparation Algorithm for Big Data.Journal of Frontiers of Computer Science and Technology,2015,9(9):1044-1055.
[19]ANDERSON E,LI X,SHAH M A,et al.What consistency does your key-value store actually provide?[C]//Proc of Workshop on Hot Topics in System Dependability.New York:ACM,2010:1-16.
[20]COLOMBO D,MAATHUIS M H.Learning high-dimensional directed acyclic graphs with latent and selection variables[J].Annals of Statistics,2012,40(2012):294-321.
[21]WANG Y,CAO S,GUO H,et al.Communication aware multiple directed acyclic graph scheduling considering cost and fairness[J].Journal of Computer Applications,2015,37(3):316-321.
[22]BING S,ZHEN Z,BING W,et al.Scene Segmentation with DAG-Recurrent Neural Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,40(6):1480-1493.
[1] 凌飞, 陈世平.
基于区块链的企业联盟共享数字积分管理机制
Shared Digital Credits Management Mechanism of Enterprise Alliance Based on Blockchain
计算机科学, 2021, 48(11A): 533-539. https://doi.org/10.11896/jsjkx.201200170
[2] 曾 一,李函逾,刘慧君,余双双,周 波.
UML模型和Java代码之间的一致性检测方法
Consistency Detection Method between UML Model and Java Source Code
计算机科学, 2015, 42(4): 151-155. https://doi.org/10.11896/j.issn.1002-137X.2015.04.030
[3] 上超望,刘清堂,赵刚,童名文.
基于CPN的BPEL活动协同授权一致性检测机制研究
Study on Mechanism of Consistency Detection in BPEL Activities Authorization Coordination Based on CPN
计算机科学, 2014, 41(7): 81-85. https://doi.org/10.11896/j.issn.1002-137X.2014.07.016
[4] 张伟丽,江春华,魏劲超.
MySQL复制技术的研究及应用
MySQL Replication Technology and its Applications
计算机科学, 2012, 39(Z11): 168-170.
[5] 何明,陈国华,梁文辉,赖海光,凌晨.
物联网环境下云数据存储安全及隐私保护策略研究
Cloud Data Storage Security and Privacy Protection Policies under IoT Environment
计算机科学, 2012, 39(5): 62-65.
[6] .
全局频繁闭项目集挖掘算法研究

计算机科学, 2008, 35(1): 193-195.
[7] 黄贤英 王柯柯 范伟.
基于星型网络的分布式关联规则挖掘算法研究

计算机科学, 2007, 34(12): 180-181.
[8] 林剑柠 吴慧中 陈学勤.
基于任务复制的网格任务调度算法

计算机科学, 2006, 33(6): 89-92.
[9] 王治和 扬延娇 张强.
基于对象服务的分布式数据库系统排队模型与分析

计算机科学, 2005, 32(8): 73-74.
[10] 龚斌 孟祥旭 杨承磊.
基于图案的分布式协同系统的设计与实现

计算机科学, 2005, 32(6): 221-223.
[11] 冯玉才 胡泉 班鹏新.
基于DM4的Java存储过程的研究

计算机科学, 2005, 32(6): 51-53.
[12] 刘振鹏 张寿华 蒋建宾 杨绍芸.
基于分布式Web数据库的移动代理平台

计算机科学, 2004, 31(B09): 51-53.
[13] 张成洪 古晓洪 白延红.
Web数据抽取技术研究进展

计算机科学, 2004, 31(2): 129-131.
[14] 钟国祥 张为群.
采用移动代理技术建立数据仓库的研究

计算机科学, 2003, 30(9): 138-140.
[15] 姚炎炎 陈怀义 等.
密码体制与分布式Web数据库的安全设计

计算机科学, 2001, 28(6): 6-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!