计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 1-6.doi: 10.11896/jsjkx.190900042

• 计算机体系结构 • 上一篇    下一篇

高性能计算与天文大数据研究综述

汪洋1,李鹏1,2,季一木1,2,樊卫北1,2,张玉杰1,2,王汝传2,陈国良1,2   

  1. (南京邮电大学计算机学院 南京210023)1;
    (江苏省无线传感网高技术研究重点实验室 南京210023)2
  • 收稿日期:2019-07-01 发布日期:2020-01-19
  • 通讯作者: 樊卫北(wbfan@njupt.edu.cn)
  • 基金资助:
    国家重点研发计划项目(2018YFB1003201);国家自然科学基金(61672296,61602261,61872196,61872194);江苏省科技支撑计划项目(BE2017166,BE2019740)

High Performance Computing and Astronomical Data:A Survey

WANG Yang1,LI Peng1,2,JI Yi-mu1,2,FAN Wei-bei1,2,ZHANG Yu-jie1,2,WANG Ru-chuan2,CHEN Guo-liang1,2   

  1. (School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)1;
    (Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks,Nanjing 210023,China)2
  • Received:2019-07-01 Published:2020-01-19
  • About author:WANG Yang,born in 1995,postgraduate.His main research interests include astronomical data processing and analysis;FAN Wei-bei,born in 1987,Ph.D,lecturer,is member of China Computer Federation (CCF).His main research interests include parallel and distributed system,data center network and cloud computing.
  • Supported by:
    This work was supperted by the National Key R&D Program of China (2018YFB1003201),National Natural Science Foundation of China (61672296,61602261,61872196,61872194),Scientific and Technological Support Project of Jiangsu Province (BE2017166,BE2019740).

摘要: 数据是天文学发展的重要驱动。分布式存储和高性能计算(High Performance Computing,HPC)为应对海量天文数据的复杂性、不规则的存储和计算起到推动作用。天文学研究中多信息和多学科交叉融合成为必然,天文大数据已进入大规模计算时代。高性能计算为天文大数据处理和分析提供了新的手段,针对一些传统手段无法解决的问题给出了新的方案。文中根据天文数据分类和特征,以高性能计算为支撑,对天文大数据的数据融合、高效存取、分析及后续处理、可视化等问题进行了研究,总结了现阶段的技术特点,提出了处理天文大数据的研究策略和技术方法,并对天文大数据处理面对的问题和发展趋势进行了探讨。

关键词: 高性能计算, 数据处理, 数据存储, 数据可视化, 天文大数据

Abstract: Data is an important driver of astronomical development.Distributed storage and High Performance Computing (HPC) have an positive effect on the complexity,irregular storage and calculation of massive astronomical data.The multi-information and multi-disciplinary integration of astronomical research has become inevitable,and astronomical big data has entered the era of large-scale computing.HPC provides a new means for astronomical big data processing and analysis,and presents new solutions to problems that cannot be solved by traditional methods.Based on the classification and characteristics of astronomical data,and supported by HPC,this paper studied the data fusion,efficient access,analysis and subsequent processing,visualization of astronomical big data,and summarized the current situation.Furthermore,this paper summarized the technical characteristics of the current stage,put forward the research strategies and technical methods for dealing with astronomical big data,and discussed the problems and development trends of the processing of astronomical big data.

Key words: Astronomical big data, Data processing, Data storage, Data visualization, High performance computing

中图分类号: 

  • TP3-05
[1]ZHANG Z,BARBARY K,NOTHAFT F A,et al.Kira:Processing Astronomy Imagery Using Big Data Technology[J].IEEE Transactions on Big Data,2016:1-14.
[2]SZALAY A S,KUNSZT P Z,THAKAR A,et al.Designing and mining multi-terabyte astronomy archives:the Sloan Digital Sky Survey[C]∥Proceedings of International Conference on ACM Sigmod Management of Data.2000:451-462.
[3]NEOPHYTOU P,GHEORGHIU R,HACHEY R,et al.Astroshelf:understanding the universe through scalable navigation of a galaxy of annotations[C]∥Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.ACM,2012:713-716.
[4]DRLICA-WAGNER A,SEVILLA-NOARBE I,RYKOFF E S,et al.Dark energy survey year 1 results:the photometric data set for cosmology[J].The Astrophysical Journal Supplement Series,2018,235(2):33.
[5]CHEN G L,MAO R,LU K Z.Parallel computing framework for big data[J].Chinese Science Bulletin,2015(5):566-569.
[6]SHEN H F,LUO S W,ZHAO H.The Model Structure of Cluster Computing System[J].Application Research of Computers,2004(2):52-55.
[7]FAN Z,QIU F,KAUFMAN A,et al.GPU Cluster for High Performance Computing[J].SC 2004,2004,1:47.
[8]BRENNAN J,KURESHI I,HOLMES V.CDES:an approach to HPC workload modelling[C]∥Proceedings of International Symposium on IEEE/ACM 18th Distributed Simulation and Real Time Applications.2014:47-54.
[9]RAMÍREZ-GALLEGO S,KRAWCZYK B,GARCÍ A,et al.A survey on data preprocessing for data stream mining:Current status and future directions[J].Neurocomputing,2017,239:39-57.
[10]陈国良.并行计算机体系结构[M].北京:高等教育出版社,2002.
[11]JIN Y L,HUANG Y L,CHEN Z N,et al.Trends and Key Technologies of High Performance Computers[J].Engineering Sciences,2001,3(6):1-8.
[12]BISTOUNI F,JAHANSHAHI M.Scalable crossbar network:a non-blocking interconnection network for large-scale systems[J].The Journal of Supercomputing,2015,71(2):697-728.
[13]HU Y,KUDOH T,KOIBUCHI M.A case of electrical circuit switched interconnection network for parallel computers[C]//2017 18th International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT).IEEE,2017:276-283.
[14]LV Y,FAN J,HSU D F,et al.Structure connectivity and substructure connectivity of k-ary n-cube networks[J].Information Sciences,2018,433:115-124.
[15]QIAN Z,FAN F,HU B,et al.Global round robin:Efficient routing with cut-through switching in fat-tree data center networks[J].IEEE/ACM Transactions on Networking,2018,26(5),2230-2241.
[16]XIANG D,LI B,FU Y.Fault-Tolerant Adaptive Routing in Dragonfly Networks[J].IEEE Transactions on Dependable and Secure Computing,2017,16(2):259-271.
[17]AKRITAS M G,SIEBERT J.A test for partial correlation with censored astronomical data[J].Monthly Notices of the Royal Astronomical Society,2018,278(4):919-924.
[18]CUI C,YU C,XIAO J,et al.Astronomy research in big-data era[J].Chinese Science Bulletin,2015,60(Z1):445-449.
[19]ZHANG Z,BARBARY K,NOTHAFT F A,et al.Scientific computing meets big data technology:An astronomy use case[C]∥Proceedings of International Conference on IEEE Big Data.2015:918-927.
[20]STEPHENS Z D,LEE S Y,FAGHRI F,et al.Big data:Astronomical or genomical?[J].Plos Biology,2015,13(7):e1002195.
[21]JACKSON K R,RAMAKRISHNAN L,MURIKI K,et al.Performance analysis of high performance computing applications on the amazon web services cloud[C]∥Proceedings of International Conference on 2nd IEEE Cloud Computing Technology and Science.2010:159-168.
[22]NIGRI E,ARANDJELOVIC O.Light curve analysis from Kepler spacecraft collected data[C]∥Proceedings of the International Conference on ACM on Multimedia Retrieval.2017:93-98.
[23]XU L,YU X X,YAN Y H.Deep learning application in astronomical big data processing[J].E-science Technology & Application,2018,9(3):49-58.
[24]ZHANG Q,YANG L T,CHEN Z,et al.A survey on deep lear- ning for big data[J].Information Fusion,2018,42:146-157.
[25]SHAN G H,XIE M J,LI F A,et al.Visualization of large scale time-varying particles data from cosmology[J].Journal of Computer-Aided Design & Computer Graphics,2015,27(1):1-8.
[26]VINOGRADOV V I.Advanced high-performance computer system architectures[J].Nuclear Inst & Methods in Physics Research A,2007,571(1/2):429-432.
[27]DEEPU C V,KURKURE N,DINDE P,et al.e-Onama:Mobile high performance computing for engineering research[C]∥Proceedings of International Conference on IEEE Third Innovative Computing Technology.2013,532-536.
[28]GAO C Z,CHENG Q,PEI H,et al.Privacy-preserving naive bayes classifiers secure against the substitution-then-comparison attack[J].Information Sciences,2018,444:72-88.
[29]LIU K,ZHOU X Z,ZHOU D R.Research and Development of Data Visualization [J].Computer Engineering,2002,28(8):1-2.
[30]BACON D F,GRAHAM S L,SHARP O J.Compiler transformations for high-performance computing[J].ACM Computing Surveys,1994,26(4):345-420.
[31]DEAN J,GHEMAWAT S.MapReduce:simplified data proces- sing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[32]ZHONG R Y,LAN S,XU C,et al.Visualization of RFID-enabled shopfloor logistics Big Data in Cloud Manufacturing[J].The International Journal of Advanced Manufacturing Techno-logy,2016,84(1-4):5-16.
[33]BRAHEM M,LOPES S,YEH L,et al.AstroSpark:towards a distributed data server for big data in astronomy[C]∥Procee-dings of international conference on the 3rd ACM SIGSPATIAL PhD Symposium.2016:3.
[34]LOEBMAN S,ORTIZ J,CHOO L,et al.Big-data management use-case:A cloud service for creating and analyzing galactic merger trees[C]∥Proceedings of international conference on Data analytics in the Cloud.2014:1-4.
[35]LIU Y B.Research on Key Technologies of Massive Data Storage for Solar Telescope[D].Yunnan:Graduate School of Chinese Academy of Sciences,2014.
[36]THORVALDSDOTTIR H,ROBINSON J T,MESIROV J P.Integrative Genomics Viewer (IGV):high-performance geno-mics data visualization and exploration[J].Briefings in Bioinformatics,2013,14(2):178-192.
[37]YOU L,TUNÇER B.Informed design platform:Interpreting “big data” to adaptive place designs[C]∥Proceedings of International Conference on IEEE 16th on Data Mining Workshops.2016:1332-1335.
[38]WANG L.Big Data and Visualization:Methods,Challenges and Technology Progress[J].Canadian Journal of Electrical & Computer Engineering,2015,34(3):3-6.
[39]ZHANG S,LI X,MING Z,et al.Learning k for kNN Classification[J].ACM Transactions on Intelligent Systems & Technology,2017,8(3):43.
[40]LOSING V,HAMMER B,WERSING H.KNN classifier with self adjusting memory for heterogeneous concept drift[C]∥Proceedings of International Conference on IEEE 16th Data Mi-ning.2016:291-300.
[41]JOG A,CARASS A,ROY S,et al.Random forest regression for magnetic resonance image synthesis[J].Medical Image Analysis,2017,35:475-488.
[42]LU M,SADIQ S,FEASTER D J,et al.Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods[J].Journal of Computational and Graphical Statistics,2018,27(1):209-219.
[43]KIM J,DALLY W J,SCOTT S,et al.Technology-driven,highly-scalable dragonfly topology[C]∥Proceedings of International Symposium on IEEE Computer Architecture.2008:77-88.
[44]SUN N,SUN B,LIN J D,et al.Lossless pruned Naive Bayes for big data classifications[J].Big Data Research,2018,14:27-36.
[45]HARRIS T.Credit scoring using the clustered support vector machine[J].Expert Systems with Applications,2015,42(2):741-750.
[46]RAVALE U,MARATHE N,PADIYA P.Feature selection based hybrid anomaly intrusion detection system using K means and RBF kernel function[J].Procedia Computer Science,2015,45:428-435.
[47]ADENIYI D A,WEI Z,YANG Y Q.Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method[J].Applied Computing and Informatics,2016,12(1):90-108.
[48]DOKMANIC I,PARHIZKAR R,RANIERI J,et al.Euclidean distance matrices:essential theory,algorithms,and applications[J].IEEE Signal Processing Magazine,2015,32(6):12-30.
[49]KE G,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree[C]∥Proceedings of InternationalConference on Advances in Neural Information Processing Systems.2017:3146-3154.
[50]BELGIU M,DRÂGUT, L.Random forest in remote sensing:A review of applications and future directions[J].ISPRS Journal of Photogrammetry and Remote Sensing,2016,114:24-31.
[51]ZHANG Y,ZHAO Y.Astronomy in the big data era[J].Data Science Journal,2015,14(11):1-9.
[1] 陈慧嫔, 王琨, 杨恒, 郑智捷.
蓝舌病毒基因组序列多元概率特征可视化分析
Visual Analysis of Multiple Probability Features of Bluetongue Virus Genome Sequence
计算机科学, 2022, 49(6A): 27-31. https://doi.org/10.11896/jsjkx.210300129
[2] 李浩东, 胡洁, 范勤勤.
基于并行分区搜索的多模态多目标优化及其应用
Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application
计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019
[3] 石健, 莫俊.
数据腐蚀对GHTSOM模型的优化
Optimization of GHTSOM Model by Data Corrosion
计算机科学, 2021, 48(6A): 664-667. https://doi.org/10.11896/jsjkx.200500129
[4] 骆菁菁, 唐卫贞, 丁继婷.
基于皮尔逊系数的管制仿真训练数据独立化与因子分析下的数据可视化研究
Research of ATC Simulator Training Values Independence Based on Pearson Correlation Coefficient and Study of Data Visualization Based on Factor Analysis
计算机科学, 2021, 48(6A): 623-628. https://doi.org/10.11896/jsjkx.210200021
[5] 鄂海红, 张田宇, 宋美娜.
基于Web的数据可视化图表渲染优化方法
Web-based Data Visualization Chart Rendering Optimization Method
计算机科学, 2021, 48(3): 119-123. https://doi.org/10.11896/jsjkx.200600038
[6] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁.
显示导向型的大规模地理矢量实时可视化技术
Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data
计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121
[7] 陈国良, 张玉杰.
并行计算学科发展历程
Development of Parallel Computing Subject
计算机科学, 2020, 47(8): 1-4. https://doi.org/10.11896/jsjkx.200600027
[8] 禹鑫燚, 施甜峰, 唐权瑞, 殷慧武, 欧林林.
面向预测性维护的工业设备管理系统
Industrial Equipment Management System for Predictive Maintenance
计算机科学, 2020, 47(11A): 667-672. https://doi.org/10.11896/jsjkx.200100091
[9] 王绪亮, 聂铁铮, 唐欣然, 黄菊, 李迪, 闫铭森, 刘畅.
流式数据处理的动态自适应缓存策略研究
Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing
计算机科学, 2020, 47(11): 122-127. https://doi.org/10.11896/jsjkx.190800093
[10] 张春祥, 赵春蕾, 陈超, 罗辉.
基于手机传感器的人体活动识别综述
Review of Human Activity Recognition Based on Mobile Phone Sensors
计算机科学, 2020, 47(10): 1-8. https://doi.org/10.11896/jsjkx.200400092
[11] 颜辉, 朱伯靖, 万文, 钟英, DavidAYune.
基于超算暨HPIC-LBM的大时空尺度三维湍流磁重联
HPIC-LBM Method Based Simulation of Large Temporal-Spatial Scale 3D Turbulent Magnetic Reconnection on Supercomputer
计算机科学, 2019, 46(8): 89-94. https://doi.org/10.11896/j.issn.1002-137X.2019.08.014
[12] 郑红波, 吴斌, 徐菲, 张美玉, 秦绪佳.
基于高斯扩散模型的垃圾焚烧废气排放可视化
Visualization of Solid Waste Incineration Exhaust Emissions Based on Gaussian Diffusion Model
计算机科学, 2019, 46(6A): 527-531.
[13] 张阳峰, 韦仕鸿, 邓娜娜, 王文瑞.
基于小波降噪的振动传感器数据分析
Vibration Sensor Data Analysis Based on Wavelet Denoising
计算机科学, 2019, 46(6A): 537-539.
[14] 张淑芳, 彭康, 宋香明, 张子昱, 王汉杰.
DNA数据存储技术研究进展
Research Progress on DNA Data Storage Technology
计算机科学, 2019, 46(6): 21-28. https://doi.org/10.11896/j.issn.1002-137X.2019.06.002
[15] 贾迅, 钱磊, 邬贵明, 吴东, 谢向辉.
FPGA应用于高性能计算的研究现状和未来挑战
Research Advances and Future Challenges of FPGA-based High Performance Computing
计算机科学, 2019, 46(11): 11-19. https://doi.org/10.11896/jsjkx.191100500C
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!