计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 25-33.doi: 10.11896/jsjkx.220900045
陆铭琛, 吕晏齐, 刘睿诚, 金培权
LU Mingchen, LYU Yanqi, LIU Ruicheng, JIN Peiquan
摘要: 近年来,随着物联网的高速发展,传感器部署的规模日益壮大。大规模的传感器每秒都会产生大量数据流,并且数据的价值会随着时间的流逝逐渐降低。因此,存储系统不仅需要能承受高速到达的数据流带来的写入压力,还需要以最快的速度将数据持久化,以供后续的查询和分析。这对存储系统的写入性能提出了更高的要求。基于水车模型的快速存储系统可以满足大数据应用场景下的高速时序数据流快速存储需求。该系统部署在高速时序数据流和底层存储节点之间,利用多个数据桶构建一个逻辑上轮转的存储模型(类似于中国古代的水车),并且通过控制每个数据桶的状态来协调数据的写入和落盘。水车模型将数据桶分配给不同的底层存储节点,从而将瞬时写入压力均摊到多个底层存储节点上,并借助多节点的并行写入提高写吞吐。水车模型被部署在单机版MongoDB上,并和分布式MongoDB进行了实验对比。实验结果表明,水车模型可以有效提升系统的写吞吐,降低写入延迟,并且具有良好的横向可扩展性。
中图分类号:
[1]WANG C,HUANG X,QIAO J,et al.Apache IoTDB:time-series database for internet of things[J].Proceedings of the VLDB Endowment,2020,13(12):2901-2904. [2]NIAZI S,ISMAIL M,HARIDI S,et al.HopsFS:Scaling Hierarchical File SystemMetadata Using NewSQL Databases[C]//15th USENIX Conference on File and Storage Technologies.2017:89-104. [3]LIU X,HAN J,ZHONG Y,et al.Implementing WebGIS on Hadoop:A case study of improving small file I/O performance on HDFS[C]//2009 IEEE International Conference on Cluster Computing and Workshops.IEEE,2009:1-8. [4]ZHANG Y,HAN W,WANG W,et al.Optimizing the storage of massive electronic pedigrees in HDFS[C]//2012 IEEE International Conference on the Internet of Things.2012:68-75. [5]ZHUO S,WU X,ZHANG W,et al.Distributed file system and classification for small images[C]//2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber,Physical and Social Computing.IEEE,2013:2231-2234. [6]NIAZI S,RONSTROM M,HARIDI S,et al.Size matters:Improving the performance of small files in hadoop[C]//Procee-dings of the 19th International Middleware Conference.2018:26-39. [7]HAO X J.Research on big data storage and management technology of Internet of things[D].Hefei:University of Science and Technology of China,2017. [8]RHEA S,WANG E,WONG E,et al.Littletable:A time-series database and its uses[C]//Proceedings of the 2017 ACM International Conference on Management of Data.2017:125-138. [9]ADAMS C,ALONSO L,ATKIN B,et al.Monarch:Google's planet-scale in-memory time series database[J].Proceedings of the VLDB Endowment,2020,13(12):3181-3194. [10]PELKONEN T,FRANKLIN S,TELLER J,et al.Gorilla:Afast,scalable,in-memory time series database[J].Proceedings of the VLDB Endowment,2015,8(12):1816-1827. [11]CAO W,GAO Y,LI F,et al.Timon:A timestamped event database for efficient telemetry data processing and analytics[C]//Proceedings of the 2020 ACM International Conference on Mana-gement of Data.2020:739-753. [12]WANG L,CAI R,FU T Z J,et al.Waterwheel:Realtime indexing and temporal range query processing over massive data streams[C]//2018 IEEE 34th International Conference on Data Engineering.2018:269-280. [13]YANG F,TSCHETTER E,LEAUTE X,et al.Druid:A real-time analytical data store[C]//Proceedings of the 2014 ACM International Conference on Management of Data.2014:157-168. [14]WANG Z,XUE J,SHAO Z.Heracles:an efficient storage model and data flushing for performance monitoring timeseries[J].Proceedings of the VLDB Endowment,2021,14(6):1080-1092. [15]LI C,LI B,BHUIYAN M,et al.FluteDB:An efficient and scalable in-memory time series database for sensor-cloud[J].Journal of Parallel and Distributed Computing,2018,122:95-108. [16]ANDERSEN M P,CULLER D E.BTrDB:Optimizing Storage System Design for Timeseries Processing[C]//14th USENIX Conference on File and Storage Technologies.2016:39-52. [17]GUPTA T,SINGH R,PHANISHAYEE A,et al.Bolt:Data management for connected homes[C]//11th USENIX Symposium on Networked Systems Design and Implementation.2014:243-256. [18]SHI X,FENG Z,LI K,et al.ByteSeries:an in-memory time series database for large-scale monitoring systems[C]//Procee-dings of the 11th ACM Symposium on Cloud Computing.2020:60-73. [19]JENSEN S K,PEDERSEN T B,THOMSEN C.Modelardb:Modular model-based time series management with spark and cassandra[J].Proceedings of the VLDB Endowment,2018,11(11):1688-1701. [20]JENSEN S K,PEDERSEN T B,THOMSEN C.Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+[C]//2021 IEEE 37th International Conference on Data Engineering.IEEE,2021:1380-1391. [21]BLALOCK D,MADDEN S,GUTTAG J.Sprintz:Time seriescompression for the internet of things[J].Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Techno-logies,2018,2(3):1-23. [22]YU X,PENG Y,LI F,et al.Two-level data compression using machine learning in time series database[C]//2020 IEEE 36th International Conference on Data Engineering.2020:1333-1344. [23]LU L,PILLAI T S,GOPALAKRISHNAN H,et al.Wisckey:Separating keys from values in ssd-conscious storage[J].ACM Transactions on Storage,2017,13(1):1-28. |
[1] | 王绪亮, 聂铁铮, 唐欣然, 黄菊, 李迪, 闫铭森, 刘畅. 流式数据处理的动态自适应缓存策略研究 Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing 计算机科学, 2020, 47(11): 122-127. https://doi.org/10.11896/jsjkx.190800093 |
[2] | 吴斌烽. 基于微服务架构的物联网中间件设计 Design of IoT Middleware Based on Microservices Architecture 计算机科学, 2019, 46(6A): 580-584. |
[3] | 潘明明,李丁丁,汤庸,刘海. 一种基于中间件的异构数据库融合访问方法及系统 Design and Implemention of Accessing Hybrid Database Systems Based on Middleware 计算机科学, 2018, 45(5): 163-167. https://doi.org/10.11896/j.issn.1002-137X.2018.05.027 |
[4] | 关炀,闫国玉,王颖,蒋遂平. RFID室内实时定位系统的数据滤波方法 Data Filtration Method for RFID Based Indoor RTLS 计算机科学, 2017, 44(Z11): 293-296. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.062 |
[5] | 刘博洋,马连博,朱云龙,邵伟平. 基于多层数据处理的嵌入式RFID中间件系统开发 Development of Embedded RFID Middleware System for Multilayer Data Processing 计算机科学, 2015, 42(Z11): 231-235. |
[6] | 丁扬,王淑刚,李石坚,潘纲. Scudware Mobile:支持可穿戴设备协同的移动中间件 Scudware Mobile:Mobile Middleware for Collaboration of Wearable Devices 计算机科学, 2015, 42(9): 18-23. https://doi.org/10.11896/j.issn.1002-137X.2015.09.004 |
[7] | 邵婧,陈左宁,殷红武,许国春. 面向PaaS云的信息流控制框架设计与实现 Design and Implementation of Information Flow Control Framework for PaaS 计算机科学, 2015, 42(12): 257-262. |
[8] | 任国超,王姜,马晓星. ConUp:一个支持构件动态更新的SCA中间件系统 ConUp:SCA Middleware with Dynamic Component Updating Support 计算机科学, 2014, 41(9): 60-62. https://doi.org/10.11896/j.issn.1002-137X.2014.09.009 |
[9] | 翁世南,杨 枨. 基于云服务的RFID流程定义语言的研究 Research on RFID Process Definition Language Based on Cloud Service 计算机科学, 2012, 39(Z11): 114-118. |
[10] | 谷青范,康介祥,冯国良,付宇卓. 动态自适应DDS实时中间件的研究与实现 Research on Implementation of Dynamic Adaptive Real-time Middleware Based on DDS 计算机科学, 2012, 39(7): 36-38. |
[11] | 胡智,闻英友,赵宏. 支持多应用任务的WSNs中间件的设计与实现 Design and Implementation of WSNs Middleware Supporting Multiple Application Task 计算机科学, 2012, 39(4): 49-52. |
[12] | 陈 昊,孙 辉,许 畅,马晓星. 一种支持自适应程序设计的移动机器人中间件 Mobile Robot Middleware Supporting Self-adaptive Programming 计算机科学, 2012, 39(10): 119-124. |
[13] | 姜美雷,丁丽丽,柏永斌,郭永康,孔祥源. 分布式频谱监测系统中间件技术研究 Research on the Technology of Middleware of Distribution Spectrum Monitoring System 计算机科学, 2011, 38(Z10): 288-292. |
[14] | 谭云松,韩建国. 一种面向服务的物联网中间件模型 Service-oriented Middleware Model for Internet of Things 计算机科学, 2011, 38(Z10): 1-3. |
[15] | 郑笛,王俊,贲可荣. 普适计算环境下基于中间件的上下文质量管理框架研究 Middleware-based Framework for the Quality Management of Context-aware Pervasive Applications 计算机科学, 2011, 38(11): 127-130. |
|