计算机科学 ›› 2018, Vol. 45 ›› Issue (9): 81-88.doi: 10.11896/j.issn.1002-137X.2018.09.012
周雯, 史雪菲, 吴毅坚, 赵文耘
ZHOU Wen, SHI Xue-fei, WU Yi-jian, ZHAO Wen-yun
摘要: Storm支持流式数据的高性能实时计算,是一种广泛使用的流式计算框架。在Storm应用的开发中,开发人员需要针对不同的流式数据需求定制开发相应的计算模块,从而导致大量的重复工作,且难以适应数据需求的变动。如何根据流式数据格式和计算方式等数据需求,快速开发Storm应用并配置相应的环境,是提升大部分流式计算应用开发效率的重要问题。提出了流式数据需求描述方法,设计并实现了一种基于Storm的、由数据需求驱动的流式数据实时处理应用辅助开发框架,其根据业务人员描述的领域数据需求自动生成符合数据处理需求的Storm实时数据处理应用。实验表明,该框架能帮助不具备Storm开发能力甚至非软件开发人员快速配置常见的基于Storm的流式计算应用,对于常见的流式数据的实时处理需求具有一定的适应性。
中图分类号:
[1]SUN D W,ZHANG G Y,ZHENG W M.Stream Computing in Big Data Environment:Key Technologies and System Examples[J].Journal of Software,2014,25(4):839-862.(in Chinese) 孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862. [2]TOSHNIWAL A,TANEJA S,SHUKLA A,et al.Storm@twitter[C]∥Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.New York:ACM,2014:147-156. [3]NEUMEYER L,ROBBINS B,NAIR A,et al.S4:Distributed stream computing platform[C]∥The 10th IEEE International Conference on Data Mining Workshops.Washington:IEEE Computer Society,2010:170-177. [4]KULKARNI S,BHAGAT N,FU M,et al.Twitter Heron: Stream Processing at Scale[C]∥Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data.New York:ACM,2015:239-250. [5]AKIDAU T,BALIKOV A,BEKIROGLU K,et al.MillWheel:Fault-Tolerant Stream Processing at Internet Scale[J].Procee-dings of the Vldb Endowment,2013,6(11):1033-1044. [6]QIAN Z,HE Y,SU C,et al.TimeStream:reliable stream computation in the cloud[C]∥Proceedings of the 8th ACM EuropeanConference on Computer Systems.New York:ACM,2013:1-14. [7]ZAHARIA M,DAS T,LI H,et al.Discretized streams:fault-tolerant streaming computation at scale[C]∥ACM SIGOPS 24th Symposium on Operating Systems Principles.New York:ACM,2013:423-438. [8]PAPAGEORGIOU A,POORMOHAMMADY E,CHENG B. Edge-Computing-Aware Deployment of Stream Processing Tasks Based on Topology-External Information:Model,Algorithms,and a Storm-Based Prototype[C]∥2016 IEEE International Congress on Big Data.Washington:IEEE,2016:259-266. [9]ANIELLO L,BALDONI R,QUERZONI L.Adaptive online scheduling in storm[C]∥The 7th ACM International Confe-rence on Distributed Event-Based Systems.New York:ACM,2013:207-218. [10]XIN Q,YAO X.Distributed QoS-Aware Scheduling in Cognitive Radio Cellular Networks[C]∥Proceedings of the 2015 International Conference on Network and Information Systems for Computers,Wuhan,China.2015:106-110. [11]XIONG A P,WANG X W,ZOU Y.Scheduling Algorithm Based on Storm Topology Hot-edge[J].Computer Engineering,2017,43(1):37-42. [12]LI T,TANG J,XU J.Performance Modeling and Predictive Scheduling for Distributed Stream Data Processing[J].IEEE Transactions on Big Data,2016:2(4):353-364. [13]SANTURKAR S,ARORA A,CHANDRASEKARAN K.Stor-mgen-A Domain specific Language to create ad-hoc Storm Topologies[C]∥Proceedings of the 2014 Federated Conference on Computer Science and Information Systems.Washington:IEEE,2014:1621-1628. [14]SUN C H.The Design and Implementation of Data Analysis System Based on Storm[D].Beijing:Beijing University of Posts and Telecommunications,2014.(in Chinese) 孙朝华.基于Storm的数据分析系统设计与实现[D].北京:北京邮电大学,2014. [15]LONG S H.Research and Implementation of Real-time Big Data Analysis System Based on Storm[D].Shanghai:Shanghai JiaoTong University,2015.(in Chinese) 龙少杭.基于Storm的实时大数据分析系统的研究与实现[D].上海:上海交通大学,2015. |
[1] | 简琤峰, 平靖, 张美玉. 面向边缘计算的Storm边缘节点调度优化方法 Edge Computing-oriented Storm Edge Node Scheduling Optimization Method 计算机科学, 2020, 47(5): 277-283. https://doi.org/10.11896/jsjkx.190600048 |
[2] | 杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰. 基于Spark Streaming的流式并行文本校对 Streaming Parallel Text Proofreading Based on Spark Streaming 计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070 |
[3] | 赵鑫, 马再超, 刘英博, 丁雨亭, 魏慕恒. 基于Apache Storm的增量式FFT及其应用 Incremental FFT Based on Apache Storm and Its Application 计算机科学, 2020, 47(11A): 504-507. https://doi.org/10.11896/jsjkx.191000086 |
[4] | 张洲, 黄国锐, 金培权. 基于Storm的任务调度:现状与研究展望 Task Scheduling on Storm:Current Situations and Research Prospects 计算机科学, 2019, 46(9): 28-35. https://doi.org/10.11896/j.issn.1002-137X.2019.09.004 |
[5] | 杨立鹏, 张仰森, 张雯, 王建, 曾健荣. 基于Storm实时流式计算框架的网络日志分析方法 Web Log Analysis Method Based on Storm Real-time Streaming Computing Framework 计算机科学, 2019, 46(9): 176-183. https://doi.org/10.11896/j.issn.1002-137X.2019.09.025 |
[6] | 梁奎奎. 一种基于Storm平台的ETL方案实现 Implementation of ETL Scheme Based on Storm Platform 计算机科学, 2019, 46(11A): 208-211. |
[7] | 王亦雄,廖湖声,孔祥翾,高红雨,苏航. CEStream:一种复杂事件流处理语言 CEStream:A Complex Event Stream Processing Language 计算机科学, 2017, 44(4): 140-143. https://doi.org/10.11896/j.issn.1002-137X.2017.04.030 |
[8] | 王金明,王远方. 基于Twitter Storm平台并行挖掘最稠密子图 Parallel Mining of Densest Subgraph Based on Twitter Storm 计算机科学, 2014, 41(1): 274-278. |
[9] | 许 畅,杨 燕,王 帅,魏 峻. 一种基于MVC模式的Portlet开发框架的设计与实现 Design and Implementation of an MVC-based Framework for Developing Portlet 计算机科学, 2012, 39(7): 119-122. |
[10] | . J2EEWeb开发框架体系结构 计算机科学, 2006, 33(8): 236-239. |
[11] | 林涛 应晶. 基于剧本的目标模型开发框架 计算机科学, 2000, 27(10): 23-26. |
|