Computer Science ›› 2023, Vol. 50 ›› Issue (1): 25-33.doi: 10.11896/jsjkx.220900045

• Database & Big Data & Data Science • Previous Articles     Next Articles

Fast Storage System for Time-series Big Data Streams Based on Waterwheel Model

LU Mingchen, LYU Yanqi, LIU Ruicheng, JIN Peiquan   

  1. School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China
  • Received:2022-09-05 Revised:2022-10-28 Online:2023-01-15 Published:2023-01-09
  • About author:LU Mingchen,born in 1997,master.His main research interests include LSM-tree and so on.
    JIN Peiquan,born in 1975,Ph.D,asso-ciate professor,is a senior member of China Computer Federation.His main research interests include databases and big data.
  • Supported by:
    National Natural Science Foundation of China(62072419).

Abstract: With the rapid development of the Internet of Things,the scale of sensor deployment has been growing in recent years.Large-scale sensors generate massive streaming data every second,and the value of the data decreases over time.Therefore,the storage system needs to be able to withstand the write pressure brought by the high-speed arriving streaming data and persist the data as fast as possible for subsequent query and analysis.This poses a considerable challenge to the write performance of the storage system.The fast storage system based on the waterwheel model can meet the fast storage requirements of high-speed time-series data streams in big data application scenarios.The proposed system is deployed between high-speed streaming data and underlying storage nodes,using multiple data buckets to build a logically rotating storage model(similar to the ancient Chinese waterwheel),and coordinating data writing and persisting by controlling the state of each data bucket.Waterwheel sends data buckets to different underlying storage nodes,so that the instantaneous write pressure is evenly distributed to multiple underlying storage nodes,and the write throughput is improved with the help of multi-node parallel writing.The waterwheel model is deployed on a stand-alone version of MongoDB,and compared with the distributed MongoDB in experiments.The results show that the proposed system can effectively improve the write throughput of the system,reduce the write latency,and has good horizontal scalability.

Key words: Time-series big data, Streaming data, Fast storage, Waterwheel model, Middleware

CLC Number: 

  • TP311
[1]WANG C,HUANG X,QIAO J,et al.Apache IoTDB:time-series database for internet of things[J].Proceedings of the VLDB Endowment,2020,13(12):2901-2904.
[2]NIAZI S,ISMAIL M,HARIDI S,et al.HopsFS:Scaling Hierarchical File SystemMetadata Using NewSQL Databases[C]//15th USENIX Conference on File and Storage Technologies.2017:89-104.
[3]LIU X,HAN J,ZHONG Y,et al.Implementing WebGIS on Hadoop:A case study of improving small file I/O performance on HDFS[C]//2009 IEEE International Conference on Cluster Computing and Workshops.IEEE,2009:1-8.
[4]ZHANG Y,HAN W,WANG W,et al.Optimizing the storage of massive electronic pedigrees in HDFS[C]//2012 IEEE International Conference on the Internet of Things.2012:68-75.
[5]ZHUO S,WU X,ZHANG W,et al.Distributed file system and classification for small images[C]//2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber,Physical and Social Computing.IEEE,2013:2231-2234.
[6]NIAZI S,RONSTROM M,HARIDI S,et al.Size matters:Improving the performance of small files in hadoop[C]//Procee-dings of the 19th International Middleware Conference.2018:26-39.
[7]HAO X J.Research on big data storage and management technology of Internet of things[D].Hefei:University of Science and Technology of China,2017.
[8]RHEA S,WANG E,WONG E,et al.Littletable:A time-series database and its uses[C]//Proceedings of the 2017 ACM International Conference on Management of Data.2017:125-138.
[9]ADAMS C,ALONSO L,ATKIN B,et al.Monarch:Google's planet-scale in-memory time series database[J].Proceedings of the VLDB Endowment,2020,13(12):3181-3194.
[10]PELKONEN T,FRANKLIN S,TELLER J,et al.Gorilla:Afast,scalable,in-memory time series database[J].Proceedings of the VLDB Endowment,2015,8(12):1816-1827.
[11]CAO W,GAO Y,LI F,et al.Timon:A timestamped event database for efficient telemetry data processing and analytics[C]//Proceedings of the 2020 ACM International Conference on Mana-gement of Data.2020:739-753.
[12]WANG L,CAI R,FU T Z J,et al.Waterwheel:Realtime indexing and temporal range query processing over massive data streams[C]//2018 IEEE 34th International Conference on Data Engineering.2018:269-280.
[13]YANG F,TSCHETTER E,LEAUTE X,et al.Druid:A real-time analytical data store[C]//Proceedings of the 2014 ACM International Conference on Management of Data.2014:157-168.
[14]WANG Z,XUE J,SHAO Z.Heracles:an efficient storage model and data flushing for performance monitoring timeseries[J].Proceedings of the VLDB Endowment,2021,14(6):1080-1092.
[15]LI C,LI B,BHUIYAN M,et al.FluteDB:An efficient and scalable in-memory time series database for sensor-cloud[J].Journal of Parallel and Distributed Computing,2018,122:95-108.
[16]ANDERSEN M P,CULLER D E.BTrDB:Optimizing Storage System Design for Timeseries Processing[C]//14th USENIX Conference on File and Storage Technologies.2016:39-52.
[17]GUPTA T,SINGH R,PHANISHAYEE A,et al.Bolt:Data management for connected homes[C]//11th USENIX Symposium on Networked Systems Design and Implementation.2014:243-256.
[18]SHI X,FENG Z,LI K,et al.ByteSeries:an in-memory time series database for large-scale monitoring systems[C]//Procee-dings of the 11th ACM Symposium on Cloud Computing.2020:60-73.
[19]JENSEN S K,PEDERSEN T B,THOMSEN C.Modelardb:Modular model-based time series management with spark and cassandra[J].Proceedings of the VLDB Endowment,2018,11(11):1688-1701.
[20]JENSEN S K,PEDERSEN T B,THOMSEN C.Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+[C]//2021 IEEE 37th International Conference on Data Engineering.IEEE,2021:1380-1391.
[21]BLALOCK D,MADDEN S,GUTTAG J.Sprintz:Time seriescompression for the internet of things[J].Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Techno-logies,2018,2(3):1-23.
[22]YU X,PENG Y,LI F,et al.Two-level data compression using machine learning in time series database[C]//2020 IEEE 36th International Conference on Data Engineering.2020:1333-1344.
[23]LU L,PILLAI T S,GOPALAKRISHNAN H,et al.Wisckey:Separating keys from values in ssd-conscious storage[J].ACM Transactions on Storage,2017,13(1):1-28.
[1] WANG Xu-liang, NIE Tie-zheng, TANG Xin-ran, HUANG Ju, LI Di, YAN Ming-sen, LIU Chang. Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing [J]. Computer Science, 2020, 47(11): 122-127.
[2] WU Ying-jie, HUANG Xin, GE Chen, SUN Lan. Adaptive Parameter Optimization for Real-time Differential Privacy Streaming Data Publication [J]. Computer Science, 2019, 46(9): 99-105.
[3] WU Bin-feng. Design of IoT Middleware Based on Microservices Architecture [J]. Computer Science, 2019, 46(6A): 580-584.
[4] PAN Ming-ming, LI Ding-ding, TANG Yong and LIU Hai. Design and Implemention of Accessing Hybrid Database Systems Based on Middleware [J]. Computer Science, 2018, 45(5): 163-167.
[5] GUAN Yang, YAN Guo-yu, WANG Ying and JIANG Sui-ping. Data Filtration Method for RFID Based Indoor RTLS [J]. Computer Science, 2017, 44(Z11): 293-296.
[6] DING Zhi-guo, MO Yu-chang and YANG Fan. Novel Anomaly Detection Method of Online Streaming Data [J]. Computer Science, 2016, 43(10): 63-65.
[7] LIANG Ke, LI Bing-yi, HU Yin, DENG Xue-bo and WEN Yong-yi. Internet of Things RFID Middleware Technology and its Applicated Research in Electric Power Communication Identity Management [J]. Computer Science, 2015, 42(Z6): 355-358.
[8] LIU Bo-yang, MA Lian-bo, ZHU Yun-long and SHAO Wei-ping. Development of Embedded RFID Middleware System for Multilayer Data Processing [J]. Computer Science, 2015, 42(Z11): 231-235.
[9] DING Yang, WANG Shu-gang, LI Shi-jian and PAN Gang. Scudware Mobile:Mobile Middleware for Collaboration of Wearable Devices [J]. Computer Science, 2015, 42(9): 18-23.
[10] SHAO Jing, CHEN Zuo-ning, YIN Hong-wu and XU Guo-chun. Design and Implementation of Information Flow Control Framework for PaaS [J]. Computer Science, 2015, 42(12): 257-262.
[11] HUANG Qing-yu and LU Luo-xian. Provenance Based Information Management Method for Microblog Messages [J]. Computer Science, 2015, 42(10): 198-201.
[12] REN Guo-chao,WANG Jiang and MA Xiao-xing. ConUp:SCA Middleware with Dynamic Component Updating Support [J]. Computer Science, 2014, 41(9): 60-62.
[13] . Research on RFID Process Definition Language Based on Cloud Service [J]. Computer Science, 2012, 39(Z11): 114-118.
[14] . Research on Implementation of Dynamic Adaptive Real-time Middleware Based on DDS [J]. Computer Science, 2012, 39(7): 36-38.
[15] . Design and Implementation of WSNs Middleware Supporting Multiple Application Task [J]. Computer Science, 2012, 39(4): 49-52.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!