Computer Science ›› 2020, Vol. 47 ›› Issue (1): 1-6.doi: 10.11896/jsjkx.190900042

• Computer Architecture • Previous Articles     Next Articles

High Performance Computing and Astronomical Data:A Survey

WANG Yang1,LI Peng1,2,JI Yi-mu1,2,FAN Wei-bei1,2,ZHANG Yu-jie1,2,WANG Ru-chuan2,CHEN Guo-liang1,2   

  1. (School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)1;
    (Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks,Nanjing 210023,China)2
  • Received:2019-07-01 Published:2020-01-19
  • About author:WANG Yang,born in 1995,postgraduate.His main research interests include astronomical data processing and analysis;FAN Wei-bei,born in 1987,Ph.D,lecturer,is member of China Computer Federation (CCF).His main research interests include parallel and distributed system,data center network and cloud computing.
  • Supported by:
    This work was supperted by the National Key R&D Program of China (2018YFB1003201),National Natural Science Foundation of China (61672296,61602261,61872196,61872194),Scientific and Technological Support Project of Jiangsu Province (BE2017166,BE2019740).

Abstract: Data is an important driver of astronomical development.Distributed storage and High Performance Computing (HPC) have an positive effect on the complexity,irregular storage and calculation of massive astronomical data.The multi-information and multi-disciplinary integration of astronomical research has become inevitable,and astronomical big data has entered the era of large-scale computing.HPC provides a new means for astronomical big data processing and analysis,and presents new solutions to problems that cannot be solved by traditional methods.Based on the classification and characteristics of astronomical data,and supported by HPC,this paper studied the data fusion,efficient access,analysis and subsequent processing,visualization of astronomical big data,and summarized the current situation.Furthermore,this paper summarized the technical characteristics of the current stage,put forward the research strategies and technical methods for dealing with astronomical big data,and discussed the problems and development trends of the processing of astronomical big data.

Key words: Astronomical big data, High performance computing, Data storage, Data processing, Data visualization

CLC Number: 

  • TP3-05
[1]ZHANG Z,BARBARY K,NOTHAFT F A,et al.Kira:Processing Astronomy Imagery Using Big Data Technology[J].IEEE Transactions on Big Data,2016:1-14.
[2]SZALAY A S,KUNSZT P Z,THAKAR A,et al.Designing and mining multi-terabyte astronomy archives:the Sloan Digital Sky Survey[C]∥Proceedings of International Conference on ACM Sigmod Management of Data.2000:451-462.
[3]NEOPHYTOU P,GHEORGHIU R,HACHEY R,et al.Astroshelf:understanding the universe through scalable navigation of a galaxy of annotations[C]∥Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.ACM,2012:713-716.
[4]DRLICA-WAGNER A,SEVILLA-NOARBE I,RYKOFF E S,et al.Dark energy survey year 1 results:the photometric data set for cosmology[J].The Astrophysical Journal Supplement Series,2018,235(2):33.
[5]CHEN G L,MAO R,LU K Z.Parallel computing framework for big data[J].Chinese Science Bulletin,2015(5):566-569.
[6]SHEN H F,LUO S W,ZHAO H.The Model Structure of Cluster Computing System[J].Application Research of Computers,2004(2):52-55.
[7]FAN Z,QIU F,KAUFMAN A,et al.GPU Cluster for High Performance Computing[J].SC 2004,2004,1:47.
[8]BRENNAN J,KURESHI I,HOLMES V.CDES:an approach to HPC workload modelling[C]∥Proceedings of International Symposium on IEEE/ACM 18th Distributed Simulation and Real Time Applications.2014:47-54.
[9]RAMÍREZ-GALLEGO S,KRAWCZYK B,GARCÍ A,et al.A survey on data preprocessing for data stream mining:Current status and future directions[J].Neurocomputing,2017,239:39-57.
[10]陈国良.并行计算机体系结构[M].北京:高等教育出版社,2002.
[11]JIN Y L,HUANG Y L,CHEN Z N,et al.Trends and Key Technologies of High Performance Computers[J].Engineering Sciences,2001,3(6):1-8.
[12]BISTOUNI F,JAHANSHAHI M.Scalable crossbar network:a non-blocking interconnection network for large-scale systems[J].The Journal of Supercomputing,2015,71(2):697-728.
[13]HU Y,KUDOH T,KOIBUCHI M.A case of electrical circuit switched interconnection network for parallel computers[C]//2017 18th International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT).IEEE,2017:276-283.
[14]LV Y,FAN J,HSU D F,et al.Structure connectivity and substructure connectivity of k-ary n-cube networks[J].Information Sciences,2018,433:115-124.
[15]QIAN Z,FAN F,HU B,et al.Global round robin:Efficient routing with cut-through switching in fat-tree data center networks[J].IEEE/ACM Transactions on Networking,2018,26(5),2230-2241.
[16]XIANG D,LI B,FU Y.Fault-Tolerant Adaptive Routing in Dragonfly Networks[J].IEEE Transactions on Dependable and Secure Computing,2017,16(2):259-271.
[17]AKRITAS M G,SIEBERT J.A test for partial correlation with censored astronomical data[J].Monthly Notices of the Royal Astronomical Society,2018,278(4):919-924.
[18]CUI C,YU C,XIAO J,et al.Astronomy research in big-data era[J].Chinese Science Bulletin,2015,60(Z1):445-449.
[19]ZHANG Z,BARBARY K,NOTHAFT F A,et al.Scientific computing meets big data technology:An astronomy use case[C]∥Proceedings of International Conference on IEEE Big Data.2015:918-927.
[20]STEPHENS Z D,LEE S Y,FAGHRI F,et al.Big data:Astronomical or genomical?[J].Plos Biology,2015,13(7):e1002195.
[21]JACKSON K R,RAMAKRISHNAN L,MURIKI K,et al.Performance analysis of high performance computing applications on the amazon web services cloud[C]∥Proceedings of International Conference on 2nd IEEE Cloud Computing Technology and Science.2010:159-168.
[22]NIGRI E,ARANDJELOVIC O.Light curve analysis from Kepler spacecraft collected data[C]∥Proceedings of the International Conference on ACM on Multimedia Retrieval.2017:93-98.
[23]XU L,YU X X,YAN Y H.Deep learning application in astronomical big data processing[J].E-science Technology & Application,2018,9(3):49-58.
[24]ZHANG Q,YANG L T,CHEN Z,et al.A survey on deep lear- ning for big data[J].Information Fusion,2018,42:146-157.
[25]SHAN G H,XIE M J,LI F A,et al.Visualization of large scale time-varying particles data from cosmology[J].Journal of Computer-Aided Design & Computer Graphics,2015,27(1):1-8.
[26]VINOGRADOV V I.Advanced high-performance computer system architectures[J].Nuclear Inst & Methods in Physics Research A,2007,571(1/2):429-432.
[27]DEEPU C V,KURKURE N,DINDE P,et al.e-Onama:Mobile high performance computing for engineering research[C]∥Proceedings of International Conference on IEEE Third Innovative Computing Technology.2013,532-536.
[28]GAO C Z,CHENG Q,PEI H,et al.Privacy-preserving naive bayes classifiers secure against the substitution-then-comparison attack[J].Information Sciences,2018,444:72-88.
[29]LIU K,ZHOU X Z,ZHOU D R.Research and Development of Data Visualization [J].Computer Engineering,2002,28(8):1-2.
[30]BACON D F,GRAHAM S L,SHARP O J.Compiler transformations for high-performance computing[J].ACM Computing Surveys,1994,26(4):345-420.
[31]DEAN J,GHEMAWAT S.MapReduce:simplified data proces- sing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
[32]ZHONG R Y,LAN S,XU C,et al.Visualization of RFID-enabled shopfloor logistics Big Data in Cloud Manufacturing[J].The International Journal of Advanced Manufacturing Techno-logy,2016,84(1-4):5-16.
[33]BRAHEM M,LOPES S,YEH L,et al.AstroSpark:towards a distributed data server for big data in astronomy[C]∥Procee-dings of international conference on the 3rd ACM SIGSPATIAL PhD Symposium.2016:3.
[34]LOEBMAN S,ORTIZ J,CHOO L,et al.Big-data management use-case:A cloud service for creating and analyzing galactic merger trees[C]∥Proceedings of international conference on Data analytics in the Cloud.2014:1-4.
[35]LIU Y B.Research on Key Technologies of Massive Data Storage for Solar Telescope[D].Yunnan:Graduate School of Chinese Academy of Sciences,2014.
[36]THORVALDSDOTTIR H,ROBINSON J T,MESIROV J P.Integrative Genomics Viewer (IGV):high-performance geno-mics data visualization and exploration[J].Briefings in Bioinformatics,2013,14(2):178-192.
[37]YOU L,TUNÇER B.Informed design platform:Interpreting “big data” to adaptive place designs[C]∥Proceedings of International Conference on IEEE 16th on Data Mining Workshops.2016:1332-1335.
[38]WANG L.Big Data and Visualization:Methods,Challenges and Technology Progress[J].Canadian Journal of Electrical & Computer Engineering,2015,34(3):3-6.
[39]ZHANG S,LI X,MING Z,et al.Learning k for kNN Classification[J].ACM Transactions on Intelligent Systems & Technology,2017,8(3):43.
[40]LOSING V,HAMMER B,WERSING H.KNN classifier with self adjusting memory for heterogeneous concept drift[C]∥Proceedings of International Conference on IEEE 16th Data Mi-ning.2016:291-300.
[41]JOG A,CARASS A,ROY S,et al.Random forest regression for magnetic resonance image synthesis[J].Medical Image Analysis,2017,35:475-488.
[42]LU M,SADIQ S,FEASTER D J,et al.Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods[J].Journal of Computational and Graphical Statistics,2018,27(1):209-219.
[43]KIM J,DALLY W J,SCOTT S,et al.Technology-driven,highly-scalable dragonfly topology[C]∥Proceedings of International Symposium on IEEE Computer Architecture.2008:77-88.
[44]SUN N,SUN B,LIN J D,et al.Lossless pruned Naive Bayes for big data classifications[J].Big Data Research,2018,14:27-36.
[45]HARRIS T.Credit scoring using the clustered support vector machine[J].Expert Systems with Applications,2015,42(2):741-750.
[46]RAVALE U,MARATHE N,PADIYA P.Feature selection based hybrid anomaly intrusion detection system using K means and RBF kernel function[J].Procedia Computer Science,2015,45:428-435.
[47]ADENIYI D A,WEI Z,YANG Y Q.Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method[J].Applied Computing and Informatics,2016,12(1):90-108.
[48]DOKMANIC I,PARHIZKAR R,RANIERI J,et al.Euclidean distance matrices:essential theory,algorithms,and applications[J].IEEE Signal Processing Magazine,2015,32(6):12-30.
[49]KE G,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree[C]∥Proceedings of InternationalConference on Advances in Neural Information Processing Systems.2017:3146-3154.
[50]BELGIU M,DRÂGUT, L.Random forest in remote sensing:A review of applications and future directions[J].ISPRS Journal of Photogrammetry and Remote Sensing,2016,114:24-31.
[51]ZHANG Y,ZHAO Y.Astronomy in the big data era[J].Data Science Journal,2015,14(11):1-9.
[1] YAN Hui, ZHU Bo-jing, WAN Wen, ZHONG Yin, David A YUNE. HPIC-LBM Method Based Simulation of Large Temporal-Spatial Scale 3D Turbulent Magnetic Reconnection on Supercomputer [J]. Computer Science, 2019, 46(8): 89-94.
[2] ZHENG Hong-bo, WU Bin, XU Fei, ZHANG Mei-yu, QIN Xu-jia. Visualization of Solid Waste Incineration Exhaust Emissions Based on Gaussian Diffusion Model [J]. Computer Science, 2019, 46(6A): 527-531.
[3] ZHANG Yang-feng, WEI Shi-hong, DENG Na-na, WANG Wen-rui. Vibration Sensor Data Analysis Based on Wavelet Denoising [J]. Computer Science, 2019, 46(6A): 537-539.
[4] ZHANG Shu-fang, PENG Kang, SONG Xiang-ming, ZHANG Zi-yu, WANG Han-jie. Research Progress on DNA Data Storage Technology [J]. Computer Science, 2019, 46(6): 21-28.
[5] JIA Xun, QIAN Lei, WU Gui-ming, WU Dong, XIE Xiang-hui. Research Advances and Future Challenges of FPGA-based High Performance Computing [J]. Computer Science, 2019, 46(11): 11-19.
[6] LI Yan, MA Jun-ming, AN Bo, CAO Dong-gang. Web Based Lightweight Tool for Big Data Processing and Visualization [J]. Computer Science, 2018, 45(9): 60-64, 93.
[7] GONG Fa-ming,LI Xiao-ran. Research on Ontology Data Storage of Massive Oil Field Based on Neo4j [J]. Computer Science, 2018, 45(6A): 549-554.
[8] QIU Ci-yun, LI Li, ZHANG Huan, WU Jia. Age of Big Data:from Von Neumann to Computing Storage Fusion [J]. Computer Science, 2018, 45(11A): 71-75.
[9] LI Hui, CHEN Hong-qian, DONG Shuang and MA Li-yi. Double Sunburst Matrix Visualization to Overview Majors Distributary Data [J]. Computer Science, 2017, 44(Z6): 455-458.
[10] TANG Ying, ZHONG Nan-jiang, SUN Kang-gao, QIN Da-kang and ZHOU Wei-hua. Clustering and Visualization of Social Network Based on User Interests [J]. Computer Science, 2017, 44(Z11): 385-390, 427.
[11] JIA Xin and ZHANG Shao-ping. Research on Wear Leveling Algorithm of NAND FLASH Memory Based on Greedy Strategy [J]. Computer Science, 2017, 44(Z11): 312-316.
[12] WANG Xi-bo, GE Hong-shuai, WANG Rui-quan and LIN Hai. Design of High Concurrent Communication Server of Elevator Remote Monitoring System [J]. Computer Science, 2017, 44(4): 157-160.
[13] SI Yu-meng, WEI Jian-wen, Simon SEE and James LIN. Parallel Design and Optimization of Galaxy Group Finding Algorithm on Comparation of SGI and Distributed-memory Cluster [J]. Computer Science, 2017, 44(10): 80-84.
[14] HUANG Dong-mei, ZHAO Dan-feng, WEI Li-fei, DU Yan-ling and WANG Zhen-hua. Managing Marine Data as Big Data:Uprising Challenges and Tentative Solutions [J]. Computer Science, 2016, 43(6): 17-23.
[15] RAN Juan and LI Xiao-yu. Mobile Data Storage Solution Based on Secret Sharing Protocol [J]. Computer Science, 2016, 43(4): 145-149.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .
[5] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105, 130 .
[6] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121, 136 .
[7] SHI Chao, XIE Zai-peng, LIU Han and LV Xin. Optimization of Container Deployment Strategy Based on Stable Matching[J]. Computer Science, 2018, 45(4): 131 -136 .
[8] PANG Bo, JIN Qian-kun, HENIGULI·Wu Mai Er and QI Xing-bin. Routing Scheme Based on Network Slicing and ILP Model in SDN[J]. Computer Science, 2018, 45(4): 143 -147 .
[9] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[10] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .