计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 1-7.doi: 10.11896/jsjkx.190100023
• 大数据与数据科学 • 下一篇
王卓昊1, 杨冬菊2,3, 徐晨阳1
WANG Zhuo-hao1, YANG Dong-ju2,3, XU Chen-yang1
摘要: 随着数据仓库的规模不断扩大,数据集成下的ETL(Extraction-Transformation-Loading)任务也随之增多,单机调度显然已经不能满足当下繁多复杂的ETL任务调度。针对ETL任务调度如何提高效率、缩短关键任务等待时间、提升资源利用率等问题,构建了一套分布式ETL任务调度框架,该框架由调度器和若干执行器组成,通过任务预处理、任务调度分配、任务执行3个阶段来完成ETL任务调度。在任务预处理阶段,对ETL任务建立权重模型,并根据权重确定调度优先级。在任务调度分配阶段,调度器根据各个执行器节点的性能及负载情况来约束执行器节点的选择,并设计贪心平衡(Greedy Balance,GB)算法来进行ETL任务执行请求的分发,使执行器节点的负载相对均衡。在任务执行阶段,通过高响应比优先(Highest Response Ratio Next,HRRN)算法确定执行器节点队列下任务的执行优先级。实验结果表明,分布式ETL任务调度框架及相应的一体化调度执行( Integrated Scheduling Execution,ISE)算法能够有效提高集群资源的利用率,缩短任务调度的执行时间。
中图分类号:
[1]ZHANG L.Integration and collection of heterogeneous data based on metedata[C]//2013 6th International Conference on Information Management,Innovation Management and IndustrialEngineering.Xi’an,2013:205-208.[2]SALEH H,NASHAAT H,SABER W,et al.IPSO Task Sche- duling Algorithm for Large Scale Data in Cloud Computing Environment[J].IEEE Access,2019,7(1):5412-5420.[3]ISLAM T,HASHEM M M A.Task Scheduling for Big Data Management in Fog Infrastructure[C]//2018 21st International Conference of Computer and Information Technology (ICCIT).IEEE,2018:1-6.[4]SAHAR M,VAHID R.A hybrid heuristic workflow scheduling algorithm for cloud computing environments[J].Journal of Experimental & Theoretical Artificial Intelligence,2015,27(6):1-15.[5]YAO Y,GAO H,WANG J,et al.New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters[J].IEEE Transactions on Cloud Computing,2019,7(1):1-1.[6]SUN J,CHO H,EASWARAN A,et al.Flow Network-Based Real-Time Scheduling for Reducing Static Energy Consumption on Multiprocessors[J].IEEE Access,2019,7(1):1330-1344.[7]KOKILAVANI T,GEORGE D I,THINAM A.Load Balanced MinMin Algorithm for Static MetaTask Scheduling in Grid Computing[J].International Journal of Computer Applications,2011,20(2):43-49.[8]MALLET F,ZHANG M.Work-in-Progress:From Logical Time Scheduling to Real-Time Scheduling[C]//2018 IEEE Real-Time Systems Symposium (RTSS).IEEE,2018:143-146.[9]ZHANG L,LIU S F,HAN L.Task scheduling algorithm based on load balancing [J].Journal of Jilin University (Science Edition),2014(4):769-772.[10]GE W C,YE B.Improved priority table scheduling algorithm based on load balancing priority[J].Journal of Shenyang University of Technology,2017,39(3):241-247.[11]YU W,LIU F,XIONG Z,et al.A Task Scheduling Mechanism Based on Quartz of Power Consumption Information Acquisition System[C]//2018 5th International Conference on Information Science and Control Engineering (ICISCE).IEEE,2018:98-101.[12]SUNDAR S,CHAMPATI J P,LIANG B.Completion Time Minimization in Multi-user Task Scheduling with Heterogeneous Processors and Budget Constraints[C]//2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS).IEEE,2018:1-6.[13]MAHMOUD R,UWE R.The Quicksort process[J].Stochastic Processes and Their Applications:An Official Journal of the Bernoulli Society for Mathematical Statistics and Probability,2014,124(2):1036-1054.[14]XIA H.Load balancing greedy algorithm for reduce on Hadoop platform[C]//2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA).IEEE,2018:212-216.[15]WANG C,MAO Y,HU B,et al.Ship Block Transportation Scheduling Problem Based on Greedy Algorithm[J].Journal of Engineering Science & Technology Review,2016,9(2):93-98.[16]LI J M,WANG X,WU Y X.An Improved Priority List Task Scheduling Algorithm[J].Computer Science,2014,4(5):20-23,36. |
[1] | 田真真, 蒋维, 郑炳旭, 孟利民. 基于服务器集群的负载均衡优化调度算法 Load Balancing Optimization Scheduling Algorithm Based on Server Cluster 计算机科学, 2022, 49(6A): 639-644. https://doi.org/10.11896/jsjkx.210800071 |
[2] | 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨. 基于国产众核处理器的深度神经网络算子加速库优化 Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor 计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226 |
[3] | 田冰川, 田臣, 周宇航, 陈贵海, 窦万春. 减少Hadoop集群中网络队头阻塞的调度算法 Reducing Head-of-Line Blocking on Network in Hadoop Clusters 计算机科学, 2022, 49(3): 11-22. https://doi.org/10.11896/jsjkx.210900117 |
[4] | 谭双杰, 林宝军, 刘迎春, 赵帅. 基于机器学习的分布式星载RTs系统负载调度算法 Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning 计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126 |
[5] | 沈彪, 沈立炜, 李弋. 空间众包任务的路径动态调度方法 Dynamic Task Scheduling Method for Space Crowdsourcing 计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249 |
[6] | 夏中, 向敏, 黄春梅. 基于CHBL的P2P视频监控网络分层管理机制 Hierarchical Management Mechanism of P2P Video Surveillance Network Based on CHBL 计算机科学, 2021, 48(9): 278-285. https://doi.org/10.11896/jsjkx.201200056 |
[7] | 宋海宁, 焦健, 刘永. 高速公路中的移动边缘计算研究 Research on Mobile Edge Computing in Expressway 计算机科学, 2021, 48(6A): 383-386. https://doi.org/10.11896/jsjkx.200900212 |
[8] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[9] | 郑增乾, 王锟, 赵涛, 蒋维, 孟利民. 带宽和时延受限的流媒体服务器集群负载均衡机制 Load Balancing Mechanism for Bandwidth and Time-delay Constrained Streaming Media Server Cluster 计算机科学, 2021, 48(6): 261-267. https://doi.org/10.11896/jsjkx.200400131 |
[10] | 姚泽玮, 林嘉雯, 胡俊钦, 陈星. 基于PSO-GA的多边缘负载均衡方法 PSO-GA Based Approach to Multi-edge Load Balancing 计算机科学, 2021, 48(11A): 456-463. https://doi.org/10.11896/jsjkx.210100191 |
[11] | 蔡凌峰, 魏祥麟, 邢长友, 邹霞, 张国敏. 故障场景下的边缘计算DAG任务重调度方法 Failure-resilient DAG Task Rescheduling in Edge Computing 计算机科学, 2021, 48(10): 334-342. https://doi.org/10.11896/jsjkx.210300304 |
[12] | 杨紫淇, 蔡英, 张皓晨, 范艳芳. 基于负载均衡的VEC服务器联合计算任务卸载方案 Computational Task Offloading Scheme Based on Load Balance for Cooperative VEC Servers 计算机科学, 2021, 48(1): 81-88. https://doi.org/10.11896/jsjkx.200800220 |
[13] | 郭飞雁, 唐兵. 基于用户延迟感知的移动边缘服务器放置方法 Mobile Edge Server Placement Method Based on User Latency-aware 计算机科学, 2021, 48(1): 103-110. https://doi.org/10.11896/jsjkx.200900146 |
[14] | 王国澎, 杨剑新, 尹飞, 蒋生健. 负载均衡的处理器运算资源分配方法 Computing Resources Allocation with Load Balance in Modern Processor 计算机科学, 2020, 47(8): 41-48. https://doi.org/10.11896/jsjkx.191000148 |
[15] | 金琪, 王俊昌, 付雄. 基于智能放置策略的Cuckoo哈希表 Cuckoo Hash Table Based on Smart Placement Strategy 计算机科学, 2020, 47(8): 80-86. https://doi.org/10.11896/jsjkx.191200109 |
|