计算机科学 ›› 2018, Vol. 45 ›› Issue (9): 81-88.doi: 10.11896/j.issn.1002-137X.2018.09.012

• 第十六届全国软件与应用学术会议 • 上一篇    下一篇

数据需求驱动的Storm应用辅助开发框架

周雯, 史雪菲, 吴毅坚, 赵文耘   

  1. 复旦大学软件学院 上海201203
    上海市数据科学重点实验室 上海201203
  • 收稿日期:2017-10-05 出版日期:2018-09-20 发布日期:2018-10-10
  • 通讯作者: 吴毅坚(1979-),男,博士,副教授,主要研究方向为软件维护与演化、大数据应用开发平台,E-mail:wuyijian@fudan.edu.cn
  • 作者简介:周 雯(1992-),女,硕士,主要研究方向为流式大数据和软件开发平台;史雪菲(1992-),女,硕士,主要研究方向为流式大数据和软件开发平台、内存计算系统软件;赵文耘(1964-),男,博士,教授,博士生导师,主要研究方向为软件工程、企业应用集成、软件开发平台、大数据等。
  • 基金资助:
    本文受上海市科技发展基金项目(16JC1400801)资助。

Framework Assisting Storm Application Development Driven by Data Requirements

ZHOU Wen, SHI Xue-fei, WU Yi-jian, ZHAO Wen-yun   

  1. Software School,Fudan University,Shanghai 201203,China
    Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 201203,China
  • Received:2017-10-05 Online:2018-09-20 Published:2018-10-10

摘要: Storm支持流式数据的高性能实时计算,是一种广泛使用的流式计算框架。在Storm应用的开发中,开发人员需要针对不同的流式数据需求定制开发相应的计算模块,从而导致大量的重复工作,且难以适应数据需求的变动。如何根据流式数据格式和计算方式等数据需求,快速开发Storm应用并配置相应的环境,是提升大部分流式计算应用开发效率的重要问题。提出了流式数据需求描述方法,设计并实现了一种基于Storm的、由数据需求驱动的流式数据实时处理应用辅助开发框架,其根据业务人员描述的领域数据需求自动生成符合数据处理需求的Storm实时数据处理应用。实验表明,该框架能帮助不具备Storm开发能力甚至非软件开发人员快速配置常见的基于Storm的流式计算应用,对于常见的流式数据的实时处理需求具有一定的适应性。

关键词: Storm, 开发框架, 流式计算, 数据需求

Abstract: Storm,a widely used stream calculation framework,supports high efficient real-time calculation for stream data.In the development of Storm applications,developers have to write modules for various stream data requirements,causing repetitive work and difficulties in adapting to changes in data requirements.How to develop Storm applications and configure corresponding environment rapidly based on data requirements such as stream data format and calculations is an important research question for improving the efficiency of stream-oriented application development.An approach for describing stream data requirements was proposed in this paper.A framework assisting Storm application development was designed and implemented for business people to describe domain-specific data requirements and gene-rate Storm applications automatically.Experiments show that the framework is able to help non-developers configure and deploy common Storm-based stream calculation applications.The framework is adaptive to common requirements in real-time stream data calculations.

Key words: Data requirements, Development framework, Storm, Stream calculation

中图分类号: 

  • TP311.5
[1]SUN D W,ZHANG G Y,ZHENG W M.Stream Computing in Big Data Environment:Key Technologies and System Examples[J].Journal of Software,2014,25(4):839-862.(in Chinese)
孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862.
[2]TOSHNIWAL A,TANEJA S,SHUKLA A,et al.Storm@twitter[C]∥Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.New York:ACM,2014:147-156.
[3]NEUMEYER L,ROBBINS B,NAIR A,et al.S4:Distributed stream computing platform[C]∥The 10th IEEE International Conference on Data Mining Workshops.Washington:IEEE Computer Society,2010:170-177.
[4]KULKARNI S,BHAGAT N,FU M,et al.Twitter Heron:
Stream Processing at Scale[C]∥Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data.New York:ACM,2015:239-250.
[5]AKIDAU T,BALIKOV A,BEKIROGLU K,et al.MillWheel:Fault-Tolerant Stream Processing at Internet Scale[J].Procee-dings of the Vldb Endowment,2013,6(11):1033-1044.
[6]QIAN Z,HE Y,SU C,et al.TimeStream:reliable stream computation in the cloud[C]∥Proceedings of the 8th ACM EuropeanConference on Computer Systems.New York:ACM,2013:1-14.
[7]ZAHARIA M,DAS T,LI H,et al.Discretized streams:fault-tolerant streaming computation at scale[C]∥ACM SIGOPS 24th Symposium on Operating Systems Principles.New York:ACM,2013:423-438.
[8]PAPAGEORGIOU A,POORMOHAMMADY E,CHENG B.
Edge-Computing-Aware Deployment of Stream Processing Tasks Based on Topology-External Information:Model,Algorithms,and a Storm-Based Prototype[C]∥2016 IEEE International Congress on Big Data.Washington:IEEE,2016:259-266.
[9]ANIELLO L,BALDONI R,QUERZONI L.Adaptive online
scheduling in storm[C]∥The 7th ACM International Confe-rence on Distributed Event-Based Systems.New York:ACM,2013:207-218.
[10]XIN Q,YAO X.Distributed QoS-Aware Scheduling in Cognitive Radio Cellular Networks[C]∥Proceedings of the 2015 International Conference on Network and Information Systems for Computers,Wuhan,China.2015:106-110.
[11]XIONG A P,WANG X W,ZOU Y.Scheduling Algorithm Based on Storm Topology Hot-edge[J].Computer Engineering,2017,43(1):37-42.
[12]LI T,TANG J,XU J.Performance Modeling and Predictive
Scheduling for Distributed Stream Data Processing[J].IEEE Transactions on Big Data,2016:2(4):353-364.
[13]SANTURKAR S,ARORA A,CHANDRASEKARAN K.Stor-mgen-A Domain specific Language to create ad-hoc Storm Topologies[C]∥Proceedings of the 2014 Federated Conference on Computer Science and Information Systems.Washington:IEEE,2014:1621-1628.
[14]SUN C H.The Design and Implementation of Data Analysis
System Based on Storm[D].Beijing:Beijing University of Posts and Telecommunications,2014.(in Chinese)
孙朝华.基于Storm的数据分析系统设计与实现[D].北京:北京邮电大学,2014.
[15]LONG S H.Research and Implementation of Real-time Big Data Analysis System Based on Storm[D].Shanghai:Shanghai JiaoTong University,2015.(in Chinese)
龙少杭.基于Storm的实时大数据分析系统的研究与实现[D].上海:上海交通大学,2015.
[1] 简琤峰, 平靖, 张美玉.
面向边缘计算的Storm边缘节点调度优化方法
Edge Computing-oriented Storm Edge Node Scheduling Optimization Method
计算机科学, 2020, 47(5): 277-283. https://doi.org/10.11896/jsjkx.190600048
[2] 杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰.
基于Spark Streaming的流式并行文本校对
Streaming Parallel Text Proofreading Based on Spark Streaming
计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070
[3] 赵鑫, 马再超, 刘英博, 丁雨亭, 魏慕恒.
基于Apache Storm的增量式FFT及其应用
Incremental FFT Based on Apache Storm and Its Application
计算机科学, 2020, 47(11A): 504-507. https://doi.org/10.11896/jsjkx.191000086
[4] 张洲, 黄国锐, 金培权.
基于Storm的任务调度:现状与研究展望
Task Scheduling on Storm:Current Situations and Research Prospects
计算机科学, 2019, 46(9): 28-35. https://doi.org/10.11896/j.issn.1002-137X.2019.09.004
[5] 杨立鹏, 张仰森, 张雯, 王建, 曾健荣.
基于Storm实时流式计算框架的网络日志分析方法
Web Log Analysis Method Based on Storm Real-time Streaming Computing Framework
计算机科学, 2019, 46(9): 176-183. https://doi.org/10.11896/j.issn.1002-137X.2019.09.025
[6] 梁奎奎.
一种基于Storm平台的ETL方案实现
Implementation of ETL Scheme Based on Storm Platform
计算机科学, 2019, 46(11A): 208-211.
[7] 王亦雄,廖湖声,孔祥翾,高红雨,苏航.
CEStream:一种复杂事件流处理语言
CEStream:A Complex Event Stream Processing Language
计算机科学, 2017, 44(4): 140-143. https://doi.org/10.11896/j.issn.1002-137X.2017.04.030
[8] 王金明,王远方.
基于Twitter Storm平台并行挖掘最稠密子图
Parallel Mining of Densest Subgraph Based on Twitter Storm
计算机科学, 2014, 41(1): 274-278.
[9] 许 畅,杨 燕,王 帅,魏 峻.
一种基于MVC模式的Portlet开发框架的设计与实现
Design and Implementation of an MVC-based Framework for Developing Portlet
计算机科学, 2012, 39(7): 119-122.
[10] .
J2EEWeb开发框架体系结构

计算机科学, 2006, 33(8): 236-239.
[11] 林涛 应晶.
基于剧本的目标模型开发框架

计算机科学, 2000, 27(10): 23-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!