Computer Science ›› 2017, Vol. 44 ›› Issue (4): 43-46.doi: 10.11896/j.issn.1002-137X.2017.04.010

Previous Articles     Next Articles

Time Series Based Killer Task Online Recognition Approach

TANG Hong-yan, LI Ying, JIA Tong and YUAN Xiao-yong   

  • Online:2018-11-13 Published:2018-11-13

Abstract: By analyzing failure frequency and failure patterns in Google cluster dataset,this paper fond what are called as killer tasks that suffer from frequent and continuous failure.Killer task is a big concern of cloud system as it causes unnecessary resource wasting and significant increase of scheduling overhead.In this paper,an online recognition approach was proposed to make use of the resource usage time series to recognize killer tasks precisely at the very early stage of their occurrence so that proactive actions can be taken to avoid rescheduling and resource wasting.The experiment results show that the proposed approach performs a 98.5% precision in recognizing killer tasks at 3% of failure duration,with a 96.75% resource saving for the cloud system averagely.

Key words: Cloud system,Killer tasks,Online recognition,Time series,Resource usage pattern,Failure frequency

[1] Google Cluster Data.https://code.google.com/p/googleclusterdata/wiki/ClusterData2011_2.
[2] WANG Y J,SUN W D,ZHOU S,et al.Key Technologies of Distributed Storage for Cloud Computing[J].Journal of Software,2012,23(4):962-986.(in Chinese) 王意洁,孙伟东,周松,等.云计算环境下的分布存储关键技术[J].软件学报,2012,23(4):962-986.
[3] REISS C,TUMANOV A,GANGER G R,et al.Towards understanding heterogeneous clouds at scale:Google trace analysis:Technical Report ISTC-CC-TR-12-101[R].Intel Science and Technology Center for Cloud Computing,2012:84.
[4] SOUALHIA M,KHOMH F,TAHAR S.Predicting Scheduling Failures in the Cloud[J].arXiv preprint arXiv:1507.03562,2015.
[5] REISS C,WILKES J,HELLERSTEIN J L.Google cluster-usage traces:format+ schema[R].Google Inc.,Mountain View,CA,USA,2011.
[6] GARRAGHAN P,TOWNEND P,XU J.An empirical failure-analysis of a large-scale cloud computing environment[C]∥2014 IEEE 15th International Symposium on High-Assurance Systems Engineering (HASE).IEEE,2014:113-120.
[7] REISS C,TUMANOV A,GANGER G R,et al.Heterogeneity and dynamicity of clouds at scale:Google trace analysis[C]∥Proceedings of the Third ACM Symposium on Cloud Computing.ACM,2012:7.
[8] MISHRA A K,HELLERSTEIN J L,CIRNE W,et al.Towards characterizing cloud backend workloads:insights from Google compute clusters[J].Acm Sigmetrics Performance Evaluation Review,2010,37(4):34-41.
[9] DI S,KONDO D,CAPPELLO F.Characterizing Cloud Applications on a Google Data Center[C]∥2013 42nd International Conference on Parallel Processing (ICPP).IEEE,2013:468-473.
[10] CHEN X,LU C D,PATTABIRAMAN K.Failure analysis ofjobs in compute clouds:A google cluster case study[C]∥2014 IEEE 25th International Symposium on Software Reliability Engineering (ISSRE).IEEE,2014:167-177.
[11] CHEN X,LU C D,PATTABIRAMAN K.Failure Prediction of Jobs in Compute Clouds:A Google Cluster Case Study[C]∥2014 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).IEEE,2014:341-346.
[12] FADISHEI H,SAADATFAR H,D ELDARI H.Job failure prediction in grid environment based on workload characteristics[C]∥14th International CSI Computer Conference,2009(CSICC 2009).IEEE,2009:329-334.
[13] RAO X,WANG H M,CHEN Z B,et al.Detecting Faults by Tracing Companion States in Cloud Computing Systems[J].Journal of Computers,2012,35(5):856-870.(in Chinese) 饶翔,王怀民,陈振邦,等.云计算系统中基于伴随状态追踪的故障检测机制[J].计算机学报,2012,35(5):856-870.
[14] WATABABE Y,OTSUKA H,SONODA M,et al.Online failure prediction in cloud datacenters by real-time message pattern learning[C]∥2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom).IEEE,2012:504-511.
[15] CHALERMARREWONG T,ACHALAKUL T,SEE S C W.Failure Prediction of Data Centers Using Time Series and Fault Tree Analysis[C]∥2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS).IEEE,2012:794-799.
[16] LIN R,WU B,YANG F,et al.An efficient adaptive failure detection mechanism for cloud platform based on volterra series[J].China Communications,2014,11(4):1-12.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!