Computer Science ›› 2019, Vol. 46 ›› Issue (9): 28-35.doi: 10.11896/j.issn.1002-137X.2019.09.004

• Surverys • Previous Articles     Next Articles

Task Scheduling on Storm:Current Situations and Research Prospects

ZHANG Zhou1, HUANG Guo-rui2, JIN Pei-quan1   

  1. (School of Computer Science and Technology,University of Science and Technology of China,Hefei 230001,China)1;
    (PLA 31002,Beijing 100081,China)2
  • Received:2018-07-05 Online:2019-09-15 Published:2019-09-02

Abstract: Distributed streaming data processing systems represented by Apache Storm provide low latency processing in complex big data processing environment.Therefore,systems have attracted wide attentions in both academic field and industrial field.In the distributed streaming data processing system,task scheduling is a critical factor to determine system performance.A good task scheduler can result in higher throughput,lower processing latency,and better resource utilization for the system.However,the original Storm task scheduler requires users to set the parallelism ma-nually,and it also uses simple round-robin method to assign tasks,which leads to poor performance in practical application.To handle this problem,researchers have proposed many optimization strategies of Storm task scheduling mechanism.This paper reviewed related works of Storm task scheduling.Firstly,the Storm system and the original task scheduling mechanism were introduced,and current optimization techniques on Storm task scheduling mechanism were sorted.Then the advantages and disadvantages of scheduling strategies were summarized and analyzed.Finally,some future development directions of Storm task scheduling optimization were discussed in order to provide references for further optimization and follow-up researches on Storm scheduling mechanism.

Key words: Apache storm, Real time scheduling, Stream processing, Task parallelism, Task scheduling

CLC Number: 

  • TP311
[1]Apache Hadoop[EB/OL].http://hadoop.apache.org/.
[2]Apache Storm[EB/OL].http://storm.apache.org/.
[3]Apache Spark[EB/OL].http://spark.apache.org/.
[4]ZAHARIA M,DAS T,LI H,et al.Discretized streams:Fault-tolerant streaming computation at scale[C]//Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles.ACM,2013:423-438.
[5]CHINTAPALLI S,PENG B J,POULOSKY P,et al.Bench-marking streaming computation engines:storm,flink and spark streaming[C]//2016 IEEE International Conference on Parallel and Distributed Processing Symposium Workshops.IEEE,2016:1789-1792.
[6]CAI Y,ZHAO G F,GUO H.A review on the scheduling optimization of real-time stream processing system Storm [J].Computer Application Research,2018,35(9):1-9.(in Chinese)蔡宇,赵国锋,郭航.实时流处理系统Storm的调度优化综述[J].计算机应用研究,2018,35(9):1-9.
[7]Apache ZooKeeper[EB/OL].http://zookeeper.apache.org/.
[8]PENG B,HOSSEINI M,HONG Z,et al.R-storm:Resource-aware scheduling in storm[C]//Proceedings of the 16th Annual Middleware Conference.ACM,2015:149-161.
[9]ANIELLO L,BALDONI R,QUERZONI L.Adaptive onlinescheduling in storm[C]//Proceedings of the 7th ACM international conference on Distributed event-based systems.ACM,2013:207-218.
[10]XU J,CHEN Z,TANG J,et al.T-Storm:Traffic-Aware Online Scheduling in Storm[C]//IEEE International Conference on Distributed Computing Systems.IEEE,2014:535-544.
[11]FISCHER L,BERNSTEIN A.Workload scheduling in distributed stream processors using graph partitioning[C]//IEEE International Conference on Big Data.IEEE,2015:124-133.
[12]FISCHER L,SCHARRENBACH T,BERNSTEIN A.Scalablelinked data stream processing via network-aware workload scheduling[C]//International Conference on Scalable Semantic Web Knowledge Base Systems.CEUR-WS.org,2013:81-96.
[13]KARYPIS G,KUMAR V.A fast and high quality multilevelscheme for partitioning irregular graphs[J].SIAM Journal on scientific Computing,1998,20(1):359-392.
[14]ESKANDARI L,HUANG Z,EYERS D.P-Scheduler:adaptive hierarchical scheduling in apache storm[C]//Proceedings of the Australasian Computer Science Week Multiconference.ACM,2016:26.
[15]JIANG J,ZHANG Z,CUI B,et al.StroMAX:Partitioning-Based Scheduler for Real-Time Stream Processing System[C]//International Conference on Database Systems for Advanced Applications.Springer,2017:269-288.
[16]XIONG A P,WANG X W,ZOU Y.Scheduling algorithm based on hot edge of Storm topological structure [J].Computer Engineering,2017,43(1):37-42.(in Chinese)熊安萍,王贤稳,邹洋.基于Storm拓扑结构热边的调度算法[J].计算机工程,2017,43(1):37-42.
[17]CARDELLINI V,GRASSI V,PRESTI F L,et al.Distributed QoS-aware scheduling in Storm[C]//Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems.ACM,2015:344-347.
[18]NARDELLI M.QoS-aware deployment of data streaming applications over distributed infrastructures[C]//International Convention on Information and Communication Technology,Electronics and Microelectronics.Croatian Society MIPRO,2016:736-741.
[19]FARAHABADY M R H,SAMANI H R D,WANG Y,et al.A QoS-aware controller for Apache Storm[C]//IEEE,International Symposium on Network Computing and Applications.IEEE,2016:334-342.
[20]SUN D,ZHANG G,YANG S,et al.Re-Stream:Real-time and energy-efficient resource scheduling in big data stream computing environments[J].Information Sciences,2015,319:92-112.
[21]SUN D,ZHANG G,WU C,et al.Building a fault tolerantframework with deadline guarantee in big data stream computing environments[J].Journal of Computer and System Scien-ces,2017,89:4-23.
[22]SU L,ZHOU Y.Tolerating correlated failures in Massively Pa-rallel Stream Processing Engines[C]//IEEE International Conference on Data Engineering.IEEE,2016:517-528.
[23]LI H,WU J,JIANG Z,et al.Integrated recovery and task allocation for stream processing[C]//IEEE,International PERFORMANCE Computing and Communications Conference.IEEE Computer Society,2017:1-8.
[24]CHEN Y R,LEE C R.G-Storm:A GPU-Aware Storm Scheduler[C]//Dependable,Autonomic and Secure Computing,Intl Conf on Pervasive Intelligence and Computing,Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress.IEEE,2016:738-745.
[25]CHAKRABORTY R,MAJUMDAR S.A priority based re-source scheduling technique for multitenant storm clusters[C]//International Symposium on PERFORMANCE Evaluation of Computer and Telecommunication Systems.IEEE,2016:1-6.
[26]BELLAVISTA P,CORRADI A,REALE A,et al.Priority-Based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications[C]//IEEE/ACM International Conference on Utility and Cloud Computing.IEEE,2015:363-370.
[27]CHATZISTERGIOU A,VIGLAS S D.Fast Heuristics forNear-Optimal Task Allocation in Data Stream Processing over Clusters[C]//ACM International Conference on Conference on Information and Knowledge Management.ACM,2014:1579-1588.
[28]ZHANG J,LI C,ZHU L,et al.The Real-Time Scheduling Stra-tegy Based on Traffic and Load Balancing in Storm[C]//IEEE International Conference on High PERFORMANCE Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems.IEEE,2016:372-379.
[29]LI C,ZHANG J,LUO Y.Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm[J].Journal of Network and Computer Applications,2017,87:100-115.
[30]QIAN W,SHEN Q,QIN J,et al.S-Storm:A Slot-Aware Sche-duling Strategy for Even Scheduler in Storm[C]//IEEE International Conference on High PERFORMANCE Computing and Communications;IEEE,International Conference on Smart City;IEEE International Conference on Data Science and Systems.IEEE,2017:623-630.
[31]SAX M J,CASTELLANOS M,CHEN Q,et al.Aeolus:An optimizer for distributed intra-node-parallel streaming systems[C]//IEEE International Conference on Data Engineering.IEEE,2013:1280-1283.
[32]FU T Z J,DING J,MA R T B,et al.DRS:Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams[C]//IEEE International Conference on Distributed Computing Systems.IEEE,2015:411-420.
[33]CARDELLINI V,NARDELLI M,LUZI D.Elastic statefulstream processing in storm[C]//International Conference on High PERFORMANCE Computing & Simulation.IEEE,2016:583-590.
[34]SHIEH C K,HUANG S W,SUN L D,et al.A topology-based scaling mechanism for Apache Storm[J].International Journal of Network Management,2017,27(3):e1933.
[35]LI J,PU C,CHEN Y,et al.Enabling Elastic Stream Processing in Shared Clusters[C]//IEEE International Conference on Cloud Computing.IEEE,2017:108-115.
[36]RHEE S H,CHO N W,BAE H.Increasing the efficiency of business processes using a theory of constraints[J].Information Systems Frontiers,2010,12(4):443-455.
[37]WENG Z,GUO Q,WANG C,et al.AdaStorm:Resource Efficient Storm with Adaptive Configuration[C]//IEEE International Conference on Data Engineering.IEEE,2017:1363-1364.
[38]WANG C,MENG X,GUO Q,et al.Orientstream:A framework for dynamic resource allocation in distributed data stream management systems[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.ACM,2016:2281-2286.
[39]WANG C,MENG X,GUO Q,et al.Automating Characterization Deployment in Distributed Data Stream Management Systems[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(12):2669-2681.
[40]DING J,FU T Z J,MA R T B,et al.Optimal Operator State Mi-gration for Elastic Data Stream Processing[J].HAL-INRIA,2015,22(3):1-8.
[41]YANG M,MA R T B.Smooth task migration in apache storm[C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data.ACM,2015:2067-2068.
[1] SHEN Biao, SHEN Li-wei, LI Yi. Dynamic Task Scheduling Method for Space Crowdsourcing [J]. Computer Science, 2022, 49(2): 231-240.
[2] TAN Shuang-jie, LIN Bao-jun, LIU Ying-chun, ZHAO Shuai. Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning [J]. Computer Science, 2022, 49(2): 336-341.
[3] WANG Zheng, JIANG Chun-mao. Cloud Task Scheduling Algorithm Based on Three-way Decisions [J]. Computer Science, 2021, 48(6A): 420-426.
[4] CAI Ling-feng, WEI Xiang-lin, XING Chang-you, ZOU Xia, ZHANG Guo-min. Failure-resilient DAG Task Rescheduling in Edge Computing [J]. Computer Science, 2021, 48(10): 334-342.
[5] ZHANG Long-xin, ZHOU Li-qian, WEN Hong, XIAO Man-sheng, DENG Xiao-jun. Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems [J]. Computer Science, 2020, 47(8): 112-118.
[6] SUN Min, CHEN Zhong-xiong, YE Qiao-nan. Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment [J]. Computer Science, 2020, 47(6): 252-259.
[7] ZHAO Xin, MA Zai-chao, LIU Ying-bo, DING Yu-ting, WEI Mu-heng. Incremental FFT Based on Apache Storm and Its Application [J]. Computer Science, 2020, 47(11A): 504-507.
[8] HU Jun-qin, ZHANG Jia-jun, HUANG Yin-hao, CHEN Xing, LIN Bing. Computation Offloading Scheduling Technology for DNN Applications in Edge Environment [J]. Computer Science, 2020, 47(10): 247-255.
[9] ZENG Jin-jing, ZHANG Jian-shan, LIN Bing, ZHANG Wen-de. Cloudlet Workload Balancing Algorithm in Wireless Metropolitan Area Networks [J]. Computer Science, 2019, 46(8): 163-170.
[10] ZHANG Jian-shan, LIN Bing, LU Yu, XU Fu-rong. Cloudlet Placement and User Task Scheduling Based on Wireless Metropolitan Area Networks [J]. Computer Science, 2019, 46(6): 128-134.
[11] MA Xiao-jin, RAO Guo-bin, XU Hua-hu. Research on Task Scheduling in Cloud Computing [J]. Computer Science, 2019, 46(3): 1-8.
[12] WANG Zhuo-hao, YANG Dong-ju, XU Chen-yang. Research on Distributed ETL Tasks Scheduling Strategy Based on ISE Algorithm [J]. Computer Science, 2019, 46(12): 1-7.
[13] XU Jun, XIANG Qian-hong, XIAO Gang. Load Balancing Scheduling Optimization of Cloud Workflow Using Improved Shuffled Frog Leaping Algorithm [J]. Computer Science, 2019, 46(11): 315-322.
[14] YUAN Jia-xin, CHEN Jian-xin, XIAO Jun, WU Dao-liang. Time-aware Minimum Area Task Scheduling Algorithm Based on Backfilling Algorithm [J]. Computer Science, 2018, 45(8): 100-104.
[15] ZHONG Zhi-feng, ZHANG Tian-tian,ZHANG Yan, YI Ming-xing ,ZENG Zhang-fan. Efficient Task Scheduling Algorithm Based on Cloud Environment [J]. Computer Science, 2018, 45(7): 90-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!