Computer Science ›› 2020, Vol. 47 ›› Issue (1): 1-6.doi: 10.11896/jsjkx.190900042

• Computer Architecture • Previous Articles     Next Articles

High Performance Computing and Astronomical Data:A Survey

WANG Yang1,LI Peng1,2,JI Yi-mu1,2,FAN Wei-bei1,2,ZHANG Yu-jie1,2,WANG Ru-chuan2,CHEN Guo-liang1,2   

  1. (School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)1;
    (Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks,Nanjing 210023,China)2
  • Received:2019-07-01 Published:2020-01-19
  • About author:WANG Yang,born in 1995,postgraduate.His main research interests include astronomical data processing and analysis;FAN Wei-bei,born in 1987,Ph.D,lecturer,is member of China Computer Federation (CCF).His main research interests include parallel and distributed system,data center network and cloud computing.
  • Supported by:
    This work was supperted by the National Key R&D Program of China (2018YFB1003201),National Natural Science Foundation of China (61672296,61602261,61872196,61872194),Scientific and Technological Support Project of Jiangsu Province (BE2017166,BE2019740).

Abstract: Data is an important driver of astronomical development.Distributed storage and High Performance Computing (HPC) have an positive effect on the complexity,irregular storage and calculation of massive astronomical data.The multi-information and multi-disciplinary integration of astronomical research has become inevitable,and astronomical big data has entered the era of large-scale computing.HPC provides a new means for astronomical big data processing and analysis,and presents new solutions to problems that cannot be solved by traditional methods.Based on the classification and characteristics of astronomical data,and supported by HPC,this paper studied the data fusion,efficient access,analysis and subsequent processing,visualization of astronomical big data,and summarized the current situation.Furthermore,this paper summarized the technical characteristics of the current stage,put forward the research strategies and technical methods for dealing with astronomical big data,and discussed the problems and development trends of the processing of astronomical big data.

Key words: Astronomical big data, Data processing, Data storage, Data visualization, High performance computing

CLC Number: 

  • TP3-05
[1]ZHANG Z,BARBARY K,NOTHAFT F A,et al.Kira:Processing Astronomy Imagery Using Big Data Technology[J].IEEE Transactions on Big Data,2016:1-14.
[2]SZALAY A S,KUNSZT P Z,THAKAR A,et al.Designing and mining multi-terabyte astronomy archives:the Sloan Digital Sky Survey[C]∥Proceedings of International Conference on ACM Sigmod Management of Data.2000:451-462.
[3]NEOPHYTOU P,GHEORGHIU R,HACHEY R,et al.Astroshelf:understanding the universe through scalable navigation of a galaxy of annotations[C]∥Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.ACM,2012:713-716.
[4]DRLICA-WAGNER A,SEVILLA-NOARBE I,RYKOFF E S,et al.Dark energy survey year 1 results:the photometric data set for cosmology[J].The Astrophysical Journal Supplement Series,2018,235(2):33.
[5]CHEN G L,MAO R,LU K Z.Parallel computing framework for big data[J].Chinese Science Bulletin,2015(5):566-569.
[6]SHEN H F,LUO S W,ZHAO H.The Model Structure of Cluster Computing System[J].Application Research of Computers,2004(2):52-55.
[7]FAN Z,QIU F,KAUFMAN A,et al.GPU Cluster for High Performance Computing[J].SC 2004,2004,1:47.
[8]BRENNAN J,KURESHI I,HOLMES V.CDES:an approach to HPC workload modelling[C]∥Proceedings of International Symposium on IEEE/ACM 18th Distributed Simulation and Real Time Applications.2014:47-54.
[9]RAMÍREZ-GALLEGO S,KRAWCZYK B,GARCÍ A,et al.A survey on data preprocessing for data stream mining:Current status and future directions[J].Neurocomputing,2017,239:39-57.
[10]陈国良.并行计算机体系结构[M].北京:高等教育出版社,2002.
[11]JIN Y L,HUANG Y L,CHEN Z N,et al.Trends and Key Technologies of High Performance Computers[J].Engineering Sciences,2001,3(6):1-8.
[12]BISTOUNI F,JAHANSHAHI M.Scalable crossbar network:a non-blocking interconnection network for large-scale systems[J].The Journal of Supercomputing,2015,71(2):697-728.
[13]HU Y,KUDOH T,KOIBUCHI M.A case of electrical circuit switched interconnection network for parallel computers[C]//2017 18th International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT).IEEE,2017:276-283.
[14]LV Y,FAN J,HSU D F,et al.Structure connectivity and substructure connectivity of k-ary n-cube networks[J].Information Sciences,2018,433:115-124.
[15]QIAN Z,FAN F,HU B,et al.Global round robin:Efficient routing with cut-through switching in fat-tree data center networks[J].IEEE/ACM Transactions on Networking,2018,26(5),2230-2241.
[16]XIANG D,LI B,FU Y.Fault-Tolerant Adaptive Routing in Dragonfly Networks[J].IEEE Transactions on Dependable and Secure Computing,2017,16(2):259-271.
[17]AKRITAS M G,SIEBERT J.A test for partial correlation with censored astronomical data[J].Monthly Notices of the Royal Astronomical Society,2018,278(4):919-924.
[18]CUI C,YU C,XIAO J,et al.Astronomy research in big-data era[J].Chinese Science Bulletin,2015,60(Z1):445-449.
[19]ZHANG Z,BARBARY K,NOTHAFT F A,et al.Scientific computing meets big data technology:An astronomy use case[C]∥Proceedings of International Conference on IEEE Big Data.2015:918-927.
[20]STEPHENS Z D,LEE S Y,FAGHRI F,et al.Big data:Astronomical or genomical?[J].Plos Biology,2015,13(7):e1002195.
[21]JACKSON K R,RAMAKRISHNAN L,MURIKI K,et al.Performance analysis of high performance computing applications on the amazon web services cloud[C]∥Proceedings of International Conference on 2nd IEEE Cloud Computing Technology and Science.2010:159-168.
[22]NIGRI E,ARANDJELOVIC O.Light curve analysis from Kepler spacecraft collected data[C]∥Proceedings of the International Conference on ACM on Multimedia Retrieval.2017:93-98.
[23]XU L,YU X X,YAN Y H.Deep learning application in astronomical big data processing[J].E-science Technology & Application,2018,9(3):49-58.
[24]ZHANG Q,YANG L T,CHEN Z,et al.A survey on deep lear- ning for big data[J].Information Fusion,2018,42:146-157.
[25]SHAN G H,XIE M J,LI F A,et al.Visualization of large scale time-varying particles data from cosmology[J].Journal of Computer-Aided Design & Computer Graphics,2015,27(1):1-8.
[26]VINOGRADOV V I.Advanced high-performance computer system architectures[J].Nuclear Inst & Methods in Physics Research A,2007,571(1/2):429-432.
[27]DEEPU C V,KURKURE N,DINDE P,et al.e-Onama:Mobile high performance computing for engineering research[C]∥Proceedings of International Conference on IEEE Third Innovative Computing Technology.2013,532-536.
[28]GAO C Z,CHENG Q,PEI H,et al.Privacy-preserving naive bayes classifiers secure against the substitution-then-comparison attack[J].Information Sciences,2018,444:72-88.
[29]LIU K,ZHOU X Z,ZHOU D R.Research and Development of Data Visualization [J].Computer Engineering,2002,28(8):1-2.
[30]BACON D F,GRAHAM S L,SHARP O J.Compiler transformations for high-performance computing[J].ACM Computing Surveys,1994,26(4):345-420.
[31]DEAN J,GHEMAWAT S.MapReduce:simplified data proces- sing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[32]ZHONG R Y,LAN S,XU C,et al.Visualization of RFID-enabled shopfloor logistics Big Data in Cloud Manufacturing[J].The International Journal of Advanced Manufacturing Techno-logy,2016,84(1-4):5-16.
[33]BRAHEM M,LOPES S,YEH L,et al.AstroSpark:towards a distributed data server for big data in astronomy[C]∥Procee-dings of international conference on the 3rd ACM SIGSPATIAL PhD Symposium.2016:3.
[34]LOEBMAN S,ORTIZ J,CHOO L,et al.Big-data management use-case:A cloud service for creating and analyzing galactic merger trees[C]∥Proceedings of international conference on Data analytics in the Cloud.2014:1-4.
[35]LIU Y B.Research on Key Technologies of Massive Data Storage for Solar Telescope[D].Yunnan:Graduate School of Chinese Academy of Sciences,2014.
[36]THORVALDSDOTTIR H,ROBINSON J T,MESIROV J P.Integrative Genomics Viewer (IGV):high-performance geno-mics data visualization and exploration[J].Briefings in Bioinformatics,2013,14(2):178-192.
[37]YOU L,TUNÇER B.Informed design platform:Interpreting “big data” to adaptive place designs[C]∥Proceedings of International Conference on IEEE 16th on Data Mining Workshops.2016:1332-1335.
[38]WANG L.Big Data and Visualization:Methods,Challenges and Technology Progress[J].Canadian Journal of Electrical & Computer Engineering,2015,34(3):3-6.
[39]ZHANG S,LI X,MING Z,et al.Learning k for kNN Classification[J].ACM Transactions on Intelligent Systems & Technology,2017,8(3):43.
[40]LOSING V,HAMMER B,WERSING H.KNN classifier with self adjusting memory for heterogeneous concept drift[C]∥Proceedings of International Conference on IEEE 16th Data Mi-ning.2016:291-300.
[41]JOG A,CARASS A,ROY S,et al.Random forest regression for magnetic resonance image synthesis[J].Medical Image Analysis,2017,35:475-488.
[42]LU M,SADIQ S,FEASTER D J,et al.Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods[J].Journal of Computational and Graphical Statistics,2018,27(1):209-219.
[43]KIM J,DALLY W J,SCOTT S,et al.Technology-driven,highly-scalable dragonfly topology[C]∥Proceedings of International Symposium on IEEE Computer Architecture.2008:77-88.
[44]SUN N,SUN B,LIN J D,et al.Lossless pruned Naive Bayes for big data classifications[J].Big Data Research,2018,14:27-36.
[45]HARRIS T.Credit scoring using the clustered support vector machine[J].Expert Systems with Applications,2015,42(2):741-750.
[46]RAVALE U,MARATHE N,PADIYA P.Feature selection based hybrid anomaly intrusion detection system using K means and RBF kernel function[J].Procedia Computer Science,2015,45:428-435.
[47]ADENIYI D A,WEI Z,YANG Y Q.Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method[J].Applied Computing and Informatics,2016,12(1):90-108.
[48]DOKMANIC I,PARHIZKAR R,RANIERI J,et al.Euclidean distance matrices:essential theory,algorithms,and applications[J].IEEE Signal Processing Magazine,2015,32(6):12-30.
[49]KE G,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree[C]∥Proceedings of InternationalConference on Advances in Neural Information Processing Systems.2017:3146-3154.
[50]BELGIU M,DRÂGUT, L.Random forest in remote sensing:A review of applications and future directions[J].ISPRS Journal of Photogrammetry and Remote Sensing,2016,114:24-31.
[51]ZHANG Y,ZHAO Y.Astronomy in the big data era[J].Data Science Journal,2015,14(11):1-9.
[1] CHEN Hui-pin, WANG Kun, YANG Heng, ZHENG Zhi-jie. Visual Analysis of Multiple Probability Features of Bluetongue Virus Genome Sequence [J]. Computer Science, 2022, 49(6A): 27-31.
[2] LUO Jing-jing, TANG Wei-zhen, DING Ji-ting. Research of ATC Simulator Training Values Independence Based on Pearson Correlation Coefficient and Study of Data Visualization Based on Factor Analysis [J]. Computer Science, 2021, 48(6A): 623-628.
[3] SHI Jian, MO Jun. Optimization of GHTSOM Model by Data Corrosion [J]. Computer Science, 2021, 48(6A): 664-667.
[4] E Hai-hong, ZHANG Tian-yu, SONG Mei-na. Web-based Data Visualization Chart Rendering Optimization Method [J]. Computer Science, 2021, 48(3): 119-123.
[5] MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122.
[6] CHEN Guo-liang, ZHANG Yu-jie, . Development of Parallel Computing Subject [J]. Computer Science, 2020, 47(8): 1-4.
[7] YU Xin-yi, SHI Tian-feng, TANG Quan-rui, YIN Hui-wu, OU Lin-lin. Industrial Equipment Management System for Predictive Maintenance [J]. Computer Science, 2020, 47(11A): 667-672.
[8] WANG Xu-liang, NIE Tie-zheng, TANG Xin-ran, HUANG Ju, LI Di, YAN Ming-sen, LIU Chang. Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing [J]. Computer Science, 2020, 47(11): 122-127.
[9] ZHANG Chun-xiang, ZHAO Chun-lei, CHEN Chao, LUO Hui. Review of Human Activity Recognition Based on Mobile Phone Sensors [J]. Computer Science, 2020, 47(10): 1-8.
[10] YAN Hui, ZHU Bo-jing, WAN Wen, ZHONG Yin, David A YUNE. HPIC-LBM Method Based Simulation of Large Temporal-Spatial Scale 3D Turbulent Magnetic Reconnection on Supercomputer [J]. Computer Science, 2019, 46(8): 89-94.
[11] ZHENG Hong-bo, WU Bin, XU Fei, ZHANG Mei-yu, QIN Xu-jia. Visualization of Solid Waste Incineration Exhaust Emissions Based on Gaussian Diffusion Model [J]. Computer Science, 2019, 46(6A): 527-531.
[12] ZHANG Yang-feng, WEI Shi-hong, DENG Na-na, WANG Wen-rui. Vibration Sensor Data Analysis Based on Wavelet Denoising [J]. Computer Science, 2019, 46(6A): 537-539.
[13] ZHANG Shu-fang, PENG Kang, SONG Xiang-ming, ZHANG Zi-yu, WANG Han-jie. Research Progress on DNA Data Storage Technology [J]. Computer Science, 2019, 46(6): 21-28.
[14] JIA Xun, QIAN Lei, WU Gui-ming, WU Dong, XIE Xiang-hui. Research Advances and Future Challenges of FPGA-based High Performance Computing [J]. Computer Science, 2019, 46(11): 11-19.
[15] LI Yan, MA Jun-ming, AN Bo, CAO Dong-gang. Web Based Lightweight Tool for Big Data Processing and Visualization [J]. Computer Science, 2018, 45(9): 60-64.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!