Computer Science ›› 2018, Vol. 45 ›› Issue (9): 60-64.doi: 10.11896/j.issn.1002-137X.2018.09.008

• NASAC 2017 • Previous Articles     Next Articles

Web Based Lightweight Tool for Big Data Processing and Visualization

LI Yan1,2, MA Jun-ming1, AN Bo1,2, CAO Dong-gang1,2   

  1. Key Lab of High Confidence Software TechnologiesPeking University,Ministry of Education,Beijing 100871,China1
    School of Electronic Engineering and Computer Science,Peking University,Beijing 100871,China2
  • Received:2017-08-15 Online:2018-09-20 Published:2018-10-10

Abstract: Researchers in the daily study often use Excel,Spss and other tools to analyze and process the data to obtain the knowledge of relevant field.However,with the arrival of large data age,due to constraints of stand-alone performance,general data processing software cannot meet the needs of researchers for large data analysis and processing.Large data processing and visualization are inseparable from the distributed computing environment.Therefore,in order to complete the rapid processing and visualization of large data,researchers not only need to purchase and maintain a distributed cluster environment,but also need to be able to program in a distributed environment and master the corresponding front-end data visualization technology.It is very difficult and unnecessary for many non-computer science data analysis workers.In view of the above problems,this paper presented a Web-based lightweight large data processing and visualization tool.Using this tool,data analysis workers can easily open a large data file(GB level) in the browser,quickly locate the file,sort the contents of the file and visualize it through a simple click and drag.At last,a correspon-ding empirical study was carried out to prove the effiectiveness of this solution.

Key words: ata analysis, Big data, Data visualization, Distributed system, Parallel computation

CLC Number: 

  • TP399
[1]AN B,MA J,CAO D,et al.Towards Efficient Resource Mana-gement in Virtual Clouds[C]∥2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).Atlanta:IEEE Press,2017:320-324.
[2]CAO D G,AN B,SHI P C,et al.Providing Virtual Cloud for
Special Purposes on Demand in JointCloud Computing Environment[J].Journal of Computer Science and Technology,2017,32(2):211-218.
[3]ZHU Y J,MA J M,AN B,et al.Monitoring and Billing of a Lightweight Cloud System Based on Linux Container[C]∥2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).Atlanta:IEEE Press,2017:325-329.
[4]AN B,SHAN X D,CUI Z C,et al.Workspace as a Service:an Online Working Environment for Private Cloud[C]∥2017 IEEE Symposium on Service-Oriented System Engineering (SOSE).San Francisco:IEEE Press,2017:19-27.
[5]KLUYVER T,RAGAN-KELLEY B,PÉREZ F,et al.Jupyter Notebooks-a publishing format for reproducible computational workflows[C]∥ International Conference on Electronic Publishing.2016:87-90.
[6]MCKINNEY W.Python for data analysis:Data wrangling with Pandas,NumPy,and IPython[M].America:O’Reilly Media,2012:111-150.
[7]MCKINNEY W.Pandas:a foundational Python library for data analysis and statistics[J/OL].Python for High Performance and Scientific Computing,https://www.researchgate.net/publication/265194455_pandas_a_Foundational_Python_Library_for_Data_Analysis_and_Statistics.
[8]Infogram.Inforgram(Version1.0)[EB/OL].https://www.infogram.com.
[9]DORY,MICHAEL,PARRISH A,et al.Introduction to Tornado:Modern Web Applications with Python[M].America:O’Reilly Media,2012:67-97.
[10]ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.
Spark:Cluster computing with working sets[C]∥Usenix Conference on Hot Topics in Cloud Computing.2016:10.
[11]GROPP W,THAKUR R,LUSK E.Using MPI-2:Advanced
features of the message passing interface[M].America:MIT Press,1999:42-55.
[12]Handsontable.Handsontable(Version1.0)[EB/OL].https://www.handontable.com.
[13]XUE Z,LI R,ZHANG H,et al.DC-Top-k:A Novel Top-k Selecting Algorithm and Its Parallelization[C]∥2016 45th International Conference on Parallel Processing (ICPP).Philadelphia:IEEE Press,2016:370-379.
[14]HUNTER J D.Matplotlib:A 2D graphics environment[J].
Computing in Science & Engineering,2007,9(3):90-95.
[15]VITTER J S.External memory algorithms and data structures:Dealing with massive data[J].ACM Computing surveys (CsUR),2001,33(2):209-271.
[16]YANG H,DASDAN A,HSIAO R L,et al.Map-reduce-merge:simplified relational data processing on large clusters[C]∥Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data.Beijing:ACM,2007:1029-1040.
[17]BORGERT S,MÜHLHÄUSER M.A S-BPM Suite for the Execution of Cross Company Subject Oriented Business Processes[C]∥International Conference on Subject-Oriented Business Process Management.Eichstätt:Springer,2014:161-170.
[18]WANG H,SHI P,ZHANG Y.JointCloud:A Cross-Cloud Cooperation Architecture for Integrated Internet Service Customization[C]∥International Conference on Distributed Computing Systems.IEEE,2017:1846-1855.
[1] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[2] CHEN Jing, WU Ling-ling. Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment [J]. Computer Science, 2022, 49(8): 108-112.
[3] CHEN Hui-pin, WANG Kun, YANG Heng, ZHENG Zhi-jie. Visual Analysis of Multiple Probability Features of Bluetongue Virus Genome Sequence [J]. Computer Science, 2022, 49(6A): 27-31.
[4] FENG Liao-liao, DING Yan, LIU Kun-lin, MA Ke-lin, CHANG Jun-sheng. Research Advance on BFT Consensus Algorithms [J]. Computer Science, 2022, 49(4): 329-339.
[5] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
[6] SUN Xuan, WANG Huan-xiao. Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives [J]. Computer Science, 2022, 49(4): 67-73.
[7] CONG Ying-nan, WANG Zhao-yu, ZHU Jin-qing. Insights into Dataset and Algorithm Related Problems in Artificial Intelligence for Law [J]. Computer Science, 2022, 49(4): 74-79.
[8] TAN Shuang-jie, LIN Bao-jun, LIU Ying-chun, ZHAO Shuai. Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning [J]. Computer Science, 2022, 49(2): 336-341.
[9] JIANG Hao-chen, WEI Zi-qi, LIU Lin, CHEN Jun. Imbalanced Data Classification:A Survey and Experiments in Medical Domain [J]. Computer Science, 2022, 49(1): 80-88.
[10] WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei. Research on Big Data Governance for Science and Technology Forecast [J]. Computer Science, 2021, 48(9): 36-42.
[11] YU Yue-zhang, XIA Tian-yu, JING Yi-nan, HE Zhen-ying, WANG Xiao-yang. Smart Interactive Guide System for Big Data Analytics [J]. Computer Science, 2021, 48(9): 110-117.
[12] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
[13] LUO Jing-jing, TANG Wei-zhen, DING Ji-ting. Research of ATC Simulator Training Values Independence Based on Pearson Correlation Coefficient and Study of Data Visualization Based on Factor Analysis [J]. Computer Science, 2021, 48(6A): 623-628.
[14] LU Yong-chao, WANG Bin-yi, HU Jiang-feng, MU Yang, REN Jun-long. Research on Integrated Electronic Time Synchronization Technology [J]. Computer Science, 2021, 48(6A): 629-632.
[15] WU Guang-zhi, GUO Bin, DING Ya-san, CHENG Jia-hui, YU Zhi-wen. Cognitive Mechanisms of Fake News [J]. Computer Science, 2021, 48(6): 306-314.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!