Computer Science ›› 2018, Vol. 45 ›› Issue (7): 7-15.doi: 10.11896/j.issn.1002-137X.2018.07.002
• CCF Big Data 2017 • Previous Articles Next Articles
LIAO Hu-sheng1,HUANG Shan-shan1,XU Jun-gang2,LIU Ren-feng2
CLC Number:
[1]ZAHARIA M.Anarchitecture for fast and general data proces-sing on large clusters[M].Morgan & Claypool,2016. [2]ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing[C]∥Proceedings of the 9th USENIX Con-ference on Networked Systems Design and Implementation.USENIX Association,2012. [3]高彦杰.Spark 大数据处理:技术,应用与性能优化[M].北京:机械工业出版社,2015. [4]Apache Spark[EB/OL].[2017-3-15].http://Spark.apache.org. [5]ApacheHadoop[EB/OL].[2017-3-20].http://apache.hadoop.org. [6]Apache Mesos[EB/OL].[2017-4-18].http://mesos.apache.org. [7]Apache Hbase[EB/OL].[2017-4-18].http://hbase.apache.org. [8]ApacheCassandra[EB/OL].[2017-4-23].https://cassandra.apache.org. [9]DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters[J].Communications of the ACM,2008,51(1):107-113. [10]Apache Pig[EB/OL].[2017-4-25].http://pig.apache.org. [11]ApacheHive[EB/OL].[2017-4-25].https://hive.apache.org. [12]BU Y,HOWE B,BALAZINSKA M,et al.HaLoop:efficientiterative data processing on large clusters[J].Proceedings of the VLDB Endowment,2010,3(1/2):285-296. [13]BU Y,HOWE B,BALAZINSKA M,et al.The HaLoop ap-proach to large-scale iterative data analysis[J].The VLDB Journal—The International Journal on Very Large Data Bases,2012,21(2):169-190. [14]ANANTHANARAYANAN G,GHODSI A,WANG A,et al.PACMan:coordinated memory caching for parallel jobs[C]∥Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation.USENIX Association,2012:20. [15]PAVLO A,PAULSON E,RASIN A,et al.A comparison of approaches to large-scale data analysis[C]∥Proceedings of the 2009 ACM SIGMOD International Conference on Management of data.ACM,2009:165-178. [16]JIANG D,OOI B C,SHI L,et al.The performance of MapReduce:an in-depth study[J].Proceedings of the VLDB Endowment,2010,3(1/2):472-483. [17]LI X R.Meituan Comment Techical Group.Spark Performance Tuning Guide[EB/OL].[2017-04-28].http://tech.meituan.com/Spark-tuning-basic.html. [18]PIAO H Q,CHEN Y G,DU X Y,et al.Equi-join optimization on Spark[J].Journal of East China Normal University(Natural Science),2014(5):261-270.(in Chinese) 卞昊穹,陈跃国,杜小勇,等.Spark 上的等值连接优化[J].华东师范大学学报 (自然科学版),2014(5):261-270. [19]BLANAS S,PATEL J M,ERCEGOVAC V,et al.A comparison of join algorithms for log processing in mapreduce[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.ACM,2010:975-986. [20]SAKR S,LIU A,FAYOUMI A G.The family of mapreduce and large-scale data processing systems[J].ACM Computing Surveys (CSUR),2013,46(1):11. [21]CHEN K,WANG B,FENG L.Data Object Cache in SparkComputing Engine[J].ZTE Technology Journal,2016,22(2):23-27.(in Chinese) 陈康,王彬,冯琳.Spark 计算引擎的数据对象缓存[J].中兴通讯技术,2016,22(2):23-27. [22]FENG L.Research and Implementation of Memory Optimaza-tion Based on Parallel Computing Engine Spark[D].Beijing:Tsinghua University,2013.(in Chinese) 冯琳.集群计算引擎 Spark 中的内存优化研究与实现[D].北京:清华大学,2013. [23]CHURILA S A,ZHOU G L,SHI L,et al.Parallel cube computing in Spark[J].Journal of Computer Applications,2016,36(2):348-352.(in Chinese) 萨初日拉,周国亮,时磊,等.Spark 环境下并行立方体计算方法[J].计算机应用,2016,36(2):348-352. [24]LI M,TAN J,WANG Y,et al.Sparkbench:a comprehensivebenchmarking suite for in memory data analytic platform spark[C]∥Proceedings of the 12th ACM International Conference on Computing Frontiers.ACM,2015:53. [25]HERODOTOU H,LIM H,LUO G,et al.Starfish:A Self-tuning System for Big Data Analytics[C]∥Fifth Biennial Conference on Innovative Data Systems Research,Asilomar.DBLP,2011:261-272. [26]HERODOTOU H,BABU S.Profiling,what-if analysis,andcost-based optimization of mapreduce programs[J].Proceedings of the VLDB Endowment,2011,4(11):1111-1122. [27]HERODOTOU H.Hadoop performance models[J].arXiv preprint arXiv.2011,1106.0940. [28]WU D,GOKHALE A.A self-tuning system based on application Profiling and Performance Analysis for optimizing Hadoop MapReduce cluster configuration[C]∥20th Annual InternationalConference on High Performance Computing.IEEE,2013:89-98. [29]WU D.A Profiling and Performance Analysis based Self-tuning System for Optimization of Hadoop MapReduce Cluster Confi-guration[D].Nashvile:Vanderbilt University,2013. [30]CHEN C O,ZHUO Y Q,YEH C C,et al.Machine Learning-Based Configuration Parameter Tuning on Hadoop System[C]∥2015 IEEE International Congress on Big Data (BigData Congress).IEEE,2015:386-392. [31]RAVI N.Configuring and optimizing Spark applications withease-Nishkam ravi,Cloudera[EB/OL].(2015-09-01).https://apachebigdata2015.sched.org/event/55afa6d65370a56bdbcb5eba5166f010#.VemuzvaqpEN. [32]CHEN Q A,LI F,CAO Y,et al.Parameter optimation for Spark jobs based on runtime data analysis[J].Computer Engineering & Science,2016,38(1):11-19.(in Chinese) 陈侨安,李峰,曹越,等.基于运行数据分析的 Spark 任务参数优化[J].计算机工程与科学,2016,38(1):11-19. [33]XU J G,WANG G L,LIU S Y,et al.A Novel Performance Evaluation and Optimization Model for Big Data System [C]∥Proceedings of the 15th International Symposium on Parallel and Distributed Computing (ISPDC 2016).Fuzhou,China,2016:1765-1773. [34]RUMI G,COLELLA C,ARDAGNA D.Optimization Tech-niques within the Hadoop Eco-system:A Survey[C]∥2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC).IEEE,2014:437-444. [35]VERMA A,CHERKASOVA L,CAMPBELL R H.ARIA:automatic resource inference and allocation for mapreduce environments[C]∥Proceedings of the 8th ACM International Confe-rence on Autonomic Computing.ACM,2011:235-244. [36]SANDHOLM T,LAI K.Dynamic proportional share scheduling in hadoop[C]∥Workshop on Job Scheduling Strategies for Paral-lel Processing.Springer Berlin Heidelberg,2010:110-131. [37]RAO B T,REDDY L S S.Survey on improved scheduling in Hadoop MapReduce in cloud environments[J].arXiv preprintarXiv:1207.0780,2012. [38]KC K,ANYANWU K.Scheduling hadoop jobs to meet deadlines[C]∥IEEE Second International Conference on Cloud Computing Technology and Science.IEEE,2011:388-392. [39]VERMA A,CHERKASOVA L,KUMAR V S,et al.Deadline-based workload management for mapreduce environments:Pieces of the performance puzzle[C]∥Network Operations and Management Symposium (NOMS).IEEE,2012:900-905. [40]ZACHEILAS N,KALOGERAKI V.Real-Time Scheduling ofSkewed MapReduce Jobs in Heterogeneous Environments[C]∥ICAC.2014:189-200. [41]XU X,CAO L,WANG X.Adaptive task scheduling strategybased on dynamic workload adjustment for heterogeneous Hadoop clusters[J].IEEE Systems Journal,2016,10(2):471-482. [42]NIGHTINGALE E B,CHEN P M,FLINN J.Speculative execution in a distributed file system [J].ACM SIGOPS Operating Systems Review,2005,39(5):191-205. [43]YANG Z W,ZHENG Q,WANG S,et al.Adaptive Task Sche-duling Strategy for heterogeneous Spark Cluster[J].Computer Engineering,2016,42(1):31-35,40.(in Chinese) 杨志伟,郑烇,王嵩,等.异构 Spark 集群下自适应任务调度策略[J].计算机工程,2016,42(1):31-35,40. [44]KANG H M.Research on Spark Optimization Based on Fine-Grained Monitoring[D].Harbin:Harbin Institute of Technology,2016.(in Chinese) 康海蒙.基于细粒度监控的 Spark 优化研究[D].哈尔滨:哈尔滨工业大学,2016. [45]RANA N,DESHMUKH S.Shuffle Performance in ApacheSpark[C]∥International Journal of Engineering Research and Technology.ESRSA Publications,2015. [46]DAVIDSON A,OR A.Optimizing Shuffle performance in Spark[R].University of California,Berkeley-Department of Electrical Engineering and Computer Sciences,2013. [47]JASON D.Consolidating Shuffle Files in Spark[EB/OL].[2017-04-28].https://issues.apache.org/jira/browse/SPARK-751. [48]CHERN Y Z.Analysis and optimization of Memory Scheduling Algorithm of Spark Shuffle[D].Hangzhou:Zhejiang University,2016.(in Chinese) 陈英芝.Spark Shuffle的内存调度算法分析及优化[D].杭州:浙江大学,2016. [49]YIGITBASI N,WILLKE T L,LIAO G,et al.Towards machine learning-based auto-tuning of mapreduce[C]∥2013 IEEE 21st International Symposium on Modelling,Analysis and Simulation of Computer and Telecommunication Systems.IEEE,2013:11-20. [50]CHEN C O,ZHUO Y Q,YEH C C,et al.Machine Learning-Based Configuration Parameter Tuning on Hadoop System[C]∥2015 IEEE International Congress on Big Data.IEEE,2015:386-392. |
[1] | DAI Hong-liang, ZHONG Guo-jin, YOU Zhi-ming , DAI Hong-ming. Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark [J]. Computer Science, 2021, 48(9): 118-124. |
[2] | YU Jian-ye, QI Yong, WANG Bao-zhuo. Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark [J]. Computer Science, 2021, 48(6A): 518-523. |
[3] | YANG Zong-lin, LI Tian-rui, LIU Sheng-jiu, YIN Cheng-feng, JIA Zhen, ZHU Jie. Streaming Parallel Text Proofreading Based on Spark Streaming [J]. Computer Science, 2020, 47(4): 36-41. |
[4] | ZHU An-qing, LI Shuai, TANG Xiao-dong. Parallel FP_growth Association Rules Mining Method on Spark Platform [J]. Computer Science, 2020, 47(12): 139-143. |
[5] | YU Xin-yi, SHI Tian-feng, TANG Quan-rui, YIN Hui-wu, OU Lin-lin. Industrial Equipment Management System for Predictive Maintenance [J]. Computer Science, 2020, 47(11A): 667-672. |
[6] | DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429. |
[7] | ZHOU Xin-yue, QIAN Li-ping, HUANG Yu-pin, WU Yuan. Optimization Method of Electric Vehicles Charging Scheduling Based on Ant Colony [J]. Computer Science, 2020, 47(11): 280-285. |
[8] | JIA Ning, LI Ying-da. Construction of Personalized Health Monitoring Platform Based on Intelligent Wearable Device [J]. Computer Science, 2019, 46(6A): 566-570. |
[9] | ZHAO Jun-xian, YU Jian. Optimization of Spark RDD Based on Non-serialization Native Storage [J]. Computer Science, 2019, 46(5): 143-149. |
[10] | WEI Liang, LIN Zi-yu, LAI Yong-xuan. DFTS:A Top-k Skyline Query for Large Datasets [J]. Computer Science, 2019, 46(5): 150-156. |
[11] | CUI Guang-fan, XU Li-jie, LIU Jie, YE Dan, ZHONG Hua. Design and Implementation of Distributed Full-text Search Framework Based on Spark SQL [J]. Computer Science, 2018, 45(9): 104-112. |
[12] | ZHAO Er-ping, MENG Xiao-feng. Spatial Index of 3D Point Cloud Data Based on Spark [J]. Computer Science, 2018, 45(9): 213-219. |
[13] | SHI Jin-ping,LI Jin,HE Feng-zhen. Diversity Recommendation Approach Based on Social Relationship and User Preference [J]. Computer Science, 2018, 45(6A): 423-427. |
[14] | SHI Jing-qi, YANG Geng, SUN Yan-jun, BAI Shuang-jie and MIN Zhao-e. Efficient Parallel Algorithm of Fully Homomorphic Encryption Supporting Operation of Floating-point Number [J]. Computer Science, 2018, 45(5): 116-122. |
[15] | PENG Zheng, WANG Ling-jiao, GUO Hua. Parallel Text Categorization of Random Forest [J]. Computer Science, 2018, 45(12): 148-152. |
|