计算机科学 ›› 2017, Vol. 44 ›› Issue (12): 163-168.doi: 10.11896/j.issn.1002-137X.2017.12.031

• 软件与数据库技术 • 上一篇    下一篇

云计算环境下低成本存储科学数据的演化CTT-SP算法

郭梅,袁栋,杨耘   

  1. 广东工业大学计算机学院 广州510006,悉尼大学电力信息工程学院 悉尼2006,斯威本科技大学软件与电子工程学院 墨尔本3122
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受广东省产学研重点项目(2014XYD-007),广东省科技计划项目(2012B091000173)资助

Evolutionary CTT-SP Algorithm for Cost-effectively Storing Scientific Datasets in Cloud

GUO Mei, YUAN Dong and YANG Yun   

  • Online:2018-12-01 Published:2018-12-01

摘要: 云计算系统强大的计算能力和存储容量,使得科学家可以在其上部署计算型和数据密集型的应用,并把大量的应用数据存储在云计算环境下。基于云服务即用即付模型,针对原有数据存储状态,考虑云服务价格变化所产生的状态调整成本,同时为降低存储大量生成的科学数据的成本,在传统最小成本基准的CTT-SP算法的基础上,提出了一种演化CTT-SP算法。 在云计算环境下针对云服务的新价格,该算法可自动决定所生成的科学数据是否需要存储,从而使计算和存储达到更佳的平衡。以亚马逊的成本模型为例,对大量随机数据集进行实验,结果表明,当云服务价格变化后,所提演化CTT-SP算法有效地降低了存储科学数据的总成本。

关键词: 数据存储,计算存储平衡,云计算,科学应用,成本

Abstract: Massive computation power and storage capacity of cloud computing systems allow scientists to deploy computation and data intensive applications in the cloud,where large application datasets can be stored.Based on the cloud service’s pay-as-you-go model,taking the status adjustment cost caused by cloud service’s price changes into consideration for the original datasets storage status,we proposed an evolutionary CTT-SP algorithm based on the traditional mini-mum cost benchmarking CTT-SP algorithm for cost-effectively storing large volume of generated scientific datasets in the cloud.The algorithm can automatically decide whether a generated dataset should be stored or not in the cloud,and also achieve better trade-off between computation and storage at the new price.Random simulations conducted with Amazon’s cost model show that the proposed evolutionary CTT-SP algorithm can save the overall cost of storing scientific datasets significantly when the cloud service’s price changes.

Key words: Datasets storage,Computation-storage trade-off,Cloud computing,Scientific application,Cost

[1] SINGH S,CHANA I.Cloud resource provisioning:survey,status and future research directions[J].Knowledge & Information Systems,2016,9(3):1-65.
[2] Amazon Cloud Services [EB/OL].http://aws.amazon.com.
[3] LI Q,ZHENG X.Research Survey of Cloud Computing[J].Computer Science,2011,8(4):32-37.(in Chinese) 李乔,郑啸.云计算研究现状综述[J].计算机科学,2011,8(4):32-37.
[4] KONDO D,JAVADI B,MALECOT P,et al.Cost-benefit analysisof Cloud Computing versus desktop grids[C]∥Proceedings of the 2009 IEEE International Symposium on Parallel and Distribu-ted Processing(IPDPS 2009).Washington DC,2009:1-12.
[5] CHEN C L P,ZHANG C Y.Data-intensive applications,challenges,techniques and technologies:A survey on Big Data[J].Information Sciences,2014,5(11):314-347.
[6] YANG X,WALLOM D,WADDINGTON S,et al.Cloud computing in e-Science:research challenges and opportunities[J].Journal of Supercomputing,2014,0(1):408-464.
[7] GUNDA P K,RAVINDRANATH L,THEKKATH C A,et al.Nectar:automatic management of data and computation in datacenters[C]∥Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation(OSDI 2010).Berkeley,2010:1-8.
[8] DEELMAN E,SINGH G,LIVNY M,et al.The cost of doing science on the cloud:The Montage example[C]∥International Conference for High Performance Computing,Networking,Storageand Analysis(SC 2008).Austin:IEEE,2008:1-12.
[9] ADAMS I F,LONG D D E,MILLER E L,et al.Maximizing efficiency by trading storage for computation[C]∥Proceedings of the 2009 Conference on Hot topics in Cloud Computing(HotCloud’2009).Berkeley,2009:1-5.
[10] YUAN D,YANG Y,LIU X,et al.A cost-effective strategy for intermediate data storage in scientific cloud workflow systems[C]∥2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).Atlanta,2010:1-12.
[11] YUAN D,YANG Y,LIU X,et al.A data dependency based strategy for intermediate data storage in scientific cloud workflow systems[J].Concurrency and Computation:Practice & Experience,2012,4(9):956-976.
[12] YUAN D,YANG Y,LIU X,et al.On-demand minimum costbenchmarking for intermediate dataset storage in scientific cloud workflow systems[J].Journal of Parallel and Distributed Computing,2011,1(2):316-332.
[13] YUAN D,YANG Y,LIU X,et al.A Local-Optimisation Based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud[C]∥Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing(Cloud 2011).Washington DC,2011:179-186.
[14] YUAN D,YANG Y,LIU X,et al.A Highly Practical Approach toward Achieving Minimum Data Sets Storage Cost in the Cloud[J].IEEE Transactions on Parallel and Distributed Systems,2013,4(6):1234-1244.
[15] YUAN D,CUI L,LI W,et al.An Algorithm for Finding the Minimum Cost of Storing and Regenerating Datasets in Multiple Clouds[J].IEEE Transactions on Cloud Computing,2015(99):1.
[16] YUAN D,LIU X,YANG Y.Dynamic On-the-Fly MinimumCost Benchmarking for Storing Generated Scientific Datasets in the Cloud[J].IEEE Transactions on Computers,2015,4(10):2781-2795.
[17] YUAN D,YANG Y,LIU X,et al.Computation and StorageTrade-Off for Cost-Effectively Storing Scientific Datasets in the Cloud[M]∥Handbook of Data Intensive Computing.Springer New York,2011:129-153.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!