计算机科学 ›› 2019, Vol. 46 ›› Issue (9): 28-35.doi: 10.11896/j.issn.1002-137X.2019.09.004
张洲1, 黄国锐2, 金培权1
ZHANG Zhou1, HUANG Guo-rui2, JIN Pei-quan1
摘要: 以Apache Storm为代表的分布式流式数据处理系统能够在复杂大数据处理环境中提供低延迟的处理,因此受到了学术界和工业界的普遍关注。在分布式流式数据处理系统中,任务调度是决定系统性能的关键因素。一个优秀的任务调度器能够为系统带来更高的吞吐量、更低的处理延迟和更好的资源利用率。Storm原生的任务调度器需要用户手动设置并行度,并且使用简单的轮询方法进行任务分配,在实际应用中性能较差。针对这一问题,研究者提出了多种面向Storm任务调度机制的优化策略。文中综述了Storm任务调度机制的相关工作,首先介绍了Storm系统以及原生的任务调度机制,并梳理了目前提出的面向Storm任务调度机制的优化技术,总结了各种方法的优点和缺点;最后讨论了Storm任务调度优化在未来的若干发展方向,以期能够为Storm任务调度机制的进一步优化和应用提供参考。
中图分类号:
[1]Apache Hadoop[EB/OL].http://hadoop.apache.org/. [2]Apache Storm[EB/OL].http://storm.apache.org/. [3]Apache Spark[EB/OL].http://spark.apache.org/. [4]ZAHARIA M,DAS T,LI H,et al.Discretized streams:Fault-tolerant streaming computation at scale[C]//Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles.ACM,2013:423-438. [5]CHINTAPALLI S,PENG B J,POULOSKY P,et al.Bench-marking streaming computation engines:storm,flink and spark streaming[C]//2016 IEEE International Conference on Parallel and Distributed Processing Symposium Workshops.IEEE,2016:1789-1792. [6]CAI Y,ZHAO G F,GUO H.A review on the scheduling optimization of real-time stream processing system Storm [J].Computer Application Research,2018,35(9):1-9.(in Chinese)蔡宇,赵国锋,郭航.实时流处理系统Storm的调度优化综述[J].计算机应用研究,2018,35(9):1-9. [7]Apache ZooKeeper[EB/OL].http://zookeeper.apache.org/. [8]PENG B,HOSSEINI M,HONG Z,et al.R-storm:Resource-aware scheduling in storm[C]//Proceedings of the 16th Annual Middleware Conference.ACM,2015:149-161. [9]ANIELLO L,BALDONI R,QUERZONI L.Adaptive onlinescheduling in storm[C]//Proceedings of the 7th ACM international conference on Distributed event-based systems.ACM,2013:207-218. [10]XU J,CHEN Z,TANG J,et al.T-Storm:Traffic-Aware Online Scheduling in Storm[C]//IEEE International Conference on Distributed Computing Systems.IEEE,2014:535-544. [11]FISCHER L,BERNSTEIN A.Workload scheduling in distributed stream processors using graph partitioning[C]//IEEE International Conference on Big Data.IEEE,2015:124-133. [12]FISCHER L,SCHARRENBACH T,BERNSTEIN A.Scalablelinked data stream processing via network-aware workload scheduling[C]//International Conference on Scalable Semantic Web Knowledge Base Systems.CEUR-WS.org,2013:81-96. [13]KARYPIS G,KUMAR V.A fast and high quality multilevelscheme for partitioning irregular graphs[J].SIAM Journal on scientific Computing,1998,20(1):359-392. [14]ESKANDARI L,HUANG Z,EYERS D.P-Scheduler:adaptive hierarchical scheduling in apache storm[C]//Proceedings of the Australasian Computer Science Week Multiconference.ACM,2016:26. [15]JIANG J,ZHANG Z,CUI B,et al.StroMAX:Partitioning-Based Scheduler for Real-Time Stream Processing System[C]//International Conference on Database Systems for Advanced Applications.Springer,2017:269-288. [16]XIONG A P,WANG X W,ZOU Y.Scheduling algorithm based on hot edge of Storm topological structure [J].Computer Engineering,2017,43(1):37-42.(in Chinese)熊安萍,王贤稳,邹洋.基于Storm拓扑结构热边的调度算法[J].计算机工程,2017,43(1):37-42. [17]CARDELLINI V,GRASSI V,PRESTI F L,et al.Distributed QoS-aware scheduling in Storm[C]//Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems.ACM,2015:344-347. [18]NARDELLI M.QoS-aware deployment of data streaming applications over distributed infrastructures[C]//International Convention on Information and Communication Technology,Electronics and Microelectronics.Croatian Society MIPRO,2016:736-741. [19]FARAHABADY M R H,SAMANI H R D,WANG Y,et al.A QoS-aware controller for Apache Storm[C]//IEEE,International Symposium on Network Computing and Applications.IEEE,2016:334-342. [20]SUN D,ZHANG G,YANG S,et al.Re-Stream:Real-time and energy-efficient resource scheduling in big data stream computing environments[J].Information Sciences,2015,319:92-112. [21]SUN D,ZHANG G,WU C,et al.Building a fault tolerantframework with deadline guarantee in big data stream computing environments[J].Journal of Computer and System Scien-ces,2017,89:4-23. [22]SU L,ZHOU Y.Tolerating correlated failures in Massively Pa-rallel Stream Processing Engines[C]//IEEE International Conference on Data Engineering.IEEE,2016:517-528. [23]LI H,WU J,JIANG Z,et al.Integrated recovery and task allocation for stream processing[C]//IEEE,International PERFORMANCE Computing and Communications Conference.IEEE Computer Society,2017:1-8. [24]CHEN Y R,LEE C R.G-Storm:A GPU-Aware Storm Scheduler[C]//Dependable,Autonomic and Secure Computing,Intl Conf on Pervasive Intelligence and Computing,Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress.IEEE,2016:738-745. [25]CHAKRABORTY R,MAJUMDAR S.A priority based re-source scheduling technique for multitenant storm clusters[C]//International Symposium on PERFORMANCE Evaluation of Computer and Telecommunication Systems.IEEE,2016:1-6. [26]BELLAVISTA P,CORRADI A,REALE A,et al.Priority-Based Resource Scheduling in Distributed Stream Processing Systems for Big Data Applications[C]//IEEE/ACM International Conference on Utility and Cloud Computing.IEEE,2015:363-370. [27]CHATZISTERGIOU A,VIGLAS S D.Fast Heuristics forNear-Optimal Task Allocation in Data Stream Processing over Clusters[C]//ACM International Conference on Conference on Information and Knowledge Management.ACM,2014:1579-1588. [28]ZHANG J,LI C,ZHU L,et al.The Real-Time Scheduling Stra-tegy Based on Traffic and Load Balancing in Storm[C]//IEEE International Conference on High PERFORMANCE Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems.IEEE,2016:372-379. [29]LI C,ZHANG J,LUO Y.Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm[J].Journal of Network and Computer Applications,2017,87:100-115. [30]QIAN W,SHEN Q,QIN J,et al.S-Storm:A Slot-Aware Sche-duling Strategy for Even Scheduler in Storm[C]//IEEE International Conference on High PERFORMANCE Computing and Communications;IEEE,International Conference on Smart City;IEEE International Conference on Data Science and Systems.IEEE,2017:623-630. [31]SAX M J,CASTELLANOS M,CHEN Q,et al.Aeolus:An optimizer for distributed intra-node-parallel streaming systems[C]//IEEE International Conference on Data Engineering.IEEE,2013:1280-1283. [32]FU T Z J,DING J,MA R T B,et al.DRS:Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams[C]//IEEE International Conference on Distributed Computing Systems.IEEE,2015:411-420. [33]CARDELLINI V,NARDELLI M,LUZI D.Elastic statefulstream processing in storm[C]//International Conference on High PERFORMANCE Computing & Simulation.IEEE,2016:583-590. [34]SHIEH C K,HUANG S W,SUN L D,et al.A topology-based scaling mechanism for Apache Storm[J].International Journal of Network Management,2017,27(3):e1933. [35]LI J,PU C,CHEN Y,et al.Enabling Elastic Stream Processing in Shared Clusters[C]//IEEE International Conference on Cloud Computing.IEEE,2017:108-115. [36]RHEE S H,CHO N W,BAE H.Increasing the efficiency of business processes using a theory of constraints[J].Information Systems Frontiers,2010,12(4):443-455. [37]WENG Z,GUO Q,WANG C,et al.AdaStorm:Resource Efficient Storm with Adaptive Configuration[C]//IEEE International Conference on Data Engineering.IEEE,2017:1363-1364. [38]WANG C,MENG X,GUO Q,et al.Orientstream:A framework for dynamic resource allocation in distributed data stream management systems[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.ACM,2016:2281-2286. [39]WANG C,MENG X,GUO Q,et al.Automating Characterization Deployment in Distributed Data Stream Management Systems[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(12):2669-2681. [40]DING J,FU T Z J,MA R T B,et al.Optimal Operator State Mi-gration for Elastic Data Stream Processing[J].HAL-INRIA,2015,22(3):1-8. [41]YANG M,MA R T B.Smooth task migration in apache storm[C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data.ACM,2015:2067-2068. |
[1] | 田冰川, 田臣, 周宇航, 陈贵海, 窦万春. 减少Hadoop集群中网络队头阻塞的调度算法 Reducing Head-of-Line Blocking on Network in Hadoop Clusters 计算机科学, 2022, 49(3): 11-22. https://doi.org/10.11896/jsjkx.210900117 |
[2] | 谭双杰, 林宝军, 刘迎春, 赵帅. 基于机器学习的分布式星载RTs系统负载调度算法 Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning 计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126 |
[3] | 沈彪, 沈立炜, 李弋. 空间众包任务的路径动态调度方法 Dynamic Task Scheduling Method for Space Crowdsourcing 计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249 |
[4] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[5] | 蔡凌峰, 魏祥麟, 邢长友, 邹霞, 张国敏. 故障场景下的边缘计算DAG任务重调度方法 Failure-resilient DAG Task Rescheduling in Edge Computing 计算机科学, 2021, 48(10): 334-342. https://doi.org/10.11896/jsjkx.210300304 |
[6] | 张龙信, 周立前, 文鸿, 肖满生, 邓晓军. 基于异构云计算的成本约束下的工作流能量高效调度算法 Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems 计算机科学, 2020, 47(8): 112-118. https://doi.org/10.11896/jsjkx.200300038 |
[7] | 孙敏, 陈中雄, 叶侨楠. 云环境下基于HEDSM的工作流调度策略 Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment 计算机科学, 2020, 47(6): 252-259. https://doi.org/10.11896/jsjkx.190400047 |
[8] | 赵鑫, 马再超, 刘英博, 丁雨亭, 魏慕恒. 基于Apache Storm的增量式FFT及其应用 Incremental FFT Based on Apache Storm and Its Application 计算机科学, 2020, 47(11A): 504-507. https://doi.org/10.11896/jsjkx.191000086 |
[9] | 胡俊钦, 张佳俊, 黄引豪, 陈星, 林兵. 边缘环境下DNN应用的计算迁移调度技术 Computation Offloading Scheduling Technology for DNN Applications in Edge Environment 计算机科学, 2020, 47(10): 247-255. https://doi.org/10.11896/jsjkx.190900106 |
[10] | 曾金晶, 张建山, 林兵, 张文德. 基于无线城域网的微云负载均衡算法 Cloudlet Workload Balancing Algorithm in Wireless Metropolitan Area Networks 计算机科学, 2019, 46(8): 163-170. https://doi.org/10.11896/j.issn.1002-137X.2019.08.027 |
[11] | 张建山, 林兵, 卢宇, 许芙蓉. 基于无线城域网的微云部署及用户任务调度 Cloudlet Placement and User Task Scheduling Based on Wireless Metropolitan Area Networks 计算机科学, 2019, 46(6): 128-134. https://doi.org/10.11896/j.issn.1002-137X.2019.06.019 |
[12] | 叶符明, 李雯婷, 王颖. MC2ETS:移动云计算中一种能效任务调度算法 MC2ETS:An Energy-efficient Tasks Scheduling Algorithm in Mobile Cloud Computing 计算机科学, 2019, 46(6): 135-142. https://doi.org/10.11896/j.issn.1002-137X.2019.06.020 |
[13] | 马小晋,饶国宾,许华虎. 云计算中任务调度研究的调查 Research on Task Scheduling in Cloud Computing 计算机科学, 2019, 46(3): 1-8. https://doi.org/10.11896/j.issn.1002-137X.2019.03.001 |
[14] | 王卓昊, 杨冬菊, 徐晨阳. 基于ISE算法的分布式ETL任务调度策略研究 Research on Distributed ETL Tasks Scheduling Strategy Based on ISE Algorithm 计算机科学, 2019, 46(12): 1-7. https://doi.org/10.11896/jsjkx.190100023 |
[15] | 徐俊, 项倩红, 肖刚. 基于改进混合蛙跳算法的云工作流负载均衡调度优化 Load Balancing Scheduling Optimization of Cloud Workflow Using Improved Shuffled Frog Leaping Algorithm 计算机科学, 2019, 46(11): 315-322. https://doi.org/10.11896/jsjkx.181001866 |
|