Computer Science ›› 2019, Vol. 46 ›› Issue (11): 11-19.doi: 10.11896/jsjkx.191100500C
• Surveys • Previous Articles Next Articles
JIA Xun, QIAN Lei, WU Gui-ming, WU Dong, XIE Xiang-hui
CLC Number:
[1]TOP500.Top 500 sites for June 2018 [EB/OL].[2018-05-29].https://www.top500.org/lists/2017/11/. [2]SHANNON L,COJOCARU V,DAO C N,et al.Technologyscaling in FPGAs:trends in applications and architectures[C]∥Proceedings of IEEE Conference on Field-Programmable Custom Computing Machines.Piscataway:IEEE Press,2015:1-8. [3]Intel Corporation.Intel Stratix 10 MX product table [EB/OL].[2018-05-31].https://www.altera.com.cn/content/dam/altera-www/global/en_US/pdfs/literature/pt/stratix-10-mx-pro-duct-table.pdf. [4]WU G M.Parallel algorithms and architectures for matrix computations on FPGA [D].Changsha:National University of Defense Technology,2011.(in Chinese) 邬贵明.FPGA矩阵计算并行算法与结构[D].长沙:国防科学技术大学,2011. [5]LEI G Q.Parallel algorithms and architectures for graph computations on FPGA [D].Changsha:National University of Defense Technology,2015.(in Chinese) 雷国庆.基于FPGA的图计算并行算法和体系结构研究[D].长沙:国防科学技术大学,2015. [6]ZHAO Y Y.The research on acceleration systems of deep beliefnetworks based on FPGAs [D].Hefei:University of Science and Technology of China,2017.(in Chinese) 赵洋洋.基于FPGA的深度信念网络加速系统研究[D].合肥:中国科学技术大学,2017. [7]LIAO X K,XIAO N.Emerging high-performance computingsystem and technology [J].Scientia Sinica Informationis,2016,46(9):1175-1210.(in Chinese) 廖湘科,肖侬.新型高性能计算系统与技术[J].中国科学:信息科学,2016,46(9):1175-1210. [8]VESTIAS M,NETO H.Trends of CPU,GPU and FPGA for high-performance computing[C]∥Proceedings of IEEE Conference on Field Programmable Logic and Applications.Piscataway:IEEE Press,2014:1-6. [9]ASANOVIC K,BODIK R,CATANZARO B C,et al.The landscape of parallel computing research:A view from Berkeley [R].Berkeley:University of California at Berkeley,2006. [10]ESCOBAR F A,CHANG X,VALDERRAMA C.Suitabilityanalysis of FPGAs for heterogeneous platforms in HPC [J].IEEE Transaction on Parallel and Distributed Systems,2016,27(2):600-612. [11]ZOHOURI H R,MARUYAMA N,SMITH A.Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs[C]∥Proceedings of the IEEE Conference on High Performance Computing,Networking,Storage and Analysis.Piscataway:IEEE Press,2016:409-420. [12]MUSLIM F B,MA L,ROOZMEH M,et al.Efficient FPGA implementation of OpenCL high-performance computing applications via high-level synthesis [J].IEEE Access,2017,5(99):2747-2762. [13]JIN Z M,FINKEL H,YOSHII K,et al.Evaluation of a floating-point intensive kernel on FPGA[C]∥Proceedings of the International Conference on Parallel and Distributed Computing.Berlin:Springer,2017:664-675. [14]BETKAOUI B,THOMAS D B,LUK W,et al.A framework for FPGA acceleration of large graph problems:Graphlet counting case study[C]∥Proceedings of IEEE Conference on Field Programmable Technology.Piscataway:IEEE Press,2011:9-16. [15]ATTIA O G,JOHNSON T,TOWNSEND K,et al.CyGraph:A reconfigurable architecture for parallel breadth-first search[C]∥Proceedings of IEEE International Parallel and Distributed Processing Symposium Workshops.Piscataway:IEEE Press,2014:228-235. [16]ZHOU S J,CHELMIS C,PRASANNA V K.Accelerating largescale sing-source shortest path on FPGA[C]∥Proceedings of IEEE International Parallel and Distributed Processing Symposium Workshops.Piscataway:IEEE Press,2015:129-136. [17]ZHU P F,ZHANG C,LI H,et al.An FPGA-based acceleration platform for auction algorithm[C]∥Proceedings of IEEE International Symposium on Circuits and Systems.Piscataway:IEEE Press,2012:1002-1005. [18]NURVITADHI E,WEISZ G,WANG Y,et al.GraphGen:AnFPGA framework for vertex-centric graph computation[C]∥Proceedings of IEEE Conference on Field-Programmable Custom Computing Machines.Piscataway:IEEE Press,2014:25-28. [19]DAI G H,CHI Y Z,WANG Y,et al.FPGP:Graph processing framework on FPGA a case study of breadth-first search[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2016:105-110. [20]KYROLA A,BLELLOCH G,GUESTRIN C.GraphChi:Large-scale graph computation on just a PC[C]∥Proceedings of the Usenix Conference on Operating Systems Design and Implementation.New York:ACM Press,2012:31-46. [21]ZHOU S J,CHELMIS C,PRASANNA V K.High-throughput and energy-efficient graph processing on FPGA[C]∥Proceedings of IEEE Conference on Field-Programmable Custom Computing Machines.Piscataway:IEEE Press,2016:103-110. [22]DAI G H,HUANG T H,CHI Y Z,et al.ForeGraph:Exploring large-scale graph processing on multi-FPGA architecture[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2017:217-226. [23]ENGELHARDT N,SO H K H.Towards flexible automaticgeneration of graph processing gateware[C]∥Proceedings of International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies.New York:ACM Press,2017:30-35. [24]ZHANG J L,KHORAM S,LI J.Boosting the performance ofFPGA-based graph processor using hybrid memory cube:A case for breadth first search[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2017:207-216. [25]KHORAM S,ZHANG J L,STANGE M,et al.Acceleratinggraph analytics by co-optimizing storage and access on an FPGA-HMC platform[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2018:239-248. [26]ZHANG J L,LI J.Degree-aware hybrid graph traversal on FPGA-HMC platform[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2018:229-238. [27]GOUMAS G,KOURTIS K,ANASTOPOULOS N,et al.Understanding the performance of sparse matrix-vector multiplication[C]∥Proceedings of the IEEE Conference on Parallel,Distributed and Network-Based Processing.Piscataway:IEEE Press,2008:283-292. [28]KESTUR S,DAVIS J D,CHUNG E S.Towards a universal FPGA matrix-vector multiplication architecture[C]∥Proceedings of IEEE Conference on Field-Programmable Custom Computing Machines.Piscataway:IEEE Press,2012:9-16. [29]FOWERS J,OVTCHAROV K,STRAUSS K,et al.A highbandwidth FPGA accelerator for sparse matrix-vector multiplication[C]∥Proceedings of IEEE Conference on Field-Programmable Custom Computing Machines.Piscataway:IEEE Press,2014:36-43. [30]GRIGORAS P,BUROVSKIY P,HUNG E,et al.AcceleratingSpMV on FPGAs by Compressing nonzero values[C]∥Procee-dings of IEEE Conference on Field-Programmable Custom Computing Machines.Piscataway:IEEE Press,2015:64-67. [31]GUO S,DOU Y,LEI Y W,et al.A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme[J].IEICE Electronics Express,2015,12(11):1-10. [32]UMUROGLU Y,JAHRE M.An energy efficient column-major backend for FPGA SpMV accelerators[C]∥Proceedings of IEEE Conference on Computer Design.Piscataway:IEEE Press,2014:432-439. [33]ZHOU L,PRASANNA V K.Sparse matrix-vector multiplication on FPGAs[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2005:63-74. [34]ZHANG Y,SHALABI Y H,NAGAR K K,et al.FPGA vs.GPU for sparse matrix vector multiply[C]∥Proceedings of IEEE Conference on Field Programmable Technology.Piscata-way:IEEE Press,2009:255-262. [35]DORRANCE R,REN F B,MARKOVIC D.A scalable sparsematrix-vector multiplication kernel for energy-efficient sparse-Blas on FPGAs[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2014:161-169. [36]GREGG D,SWEENEY C M,ELROY C M,et al.FPGA based sparse matrix vector multiplication using commodity DRAM technology[C]∥Proceedings of IEEE Conference on Field Programmable Logic and Applications.Piscataway:IEEE Press,2007:786-791. [37]UMUROGLU Y,JAHRE M.A vector caching scheme forstreaming FPGA SpMV accelerators[C]∥Proceedings of the International Symposium on Applied Reconfigurable Computing.Berlin:Springer,2015:15-26. [38]GRIGORAS P,BUROVSKIY P,LUK W.CASK-Open-sourcecustom architects for sparse kernels[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2016:179-184. [39]LI S C,WANG Y D,WEN W J,et al.A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel[C]∥Proceedings of IEEE Confe-rence on ComputerAided Design.Piscataway:IEEE Press,2016:93-98. [40]SANO K,HATSUDA Y,YAMAMOTO S.Scalable streaming-array of simple soft-processors for stencil computations with constant memory-bandwidth[C]∥Proceedings of IEEE Conference on Field-Programmable Custom Computing Machines.Piscataway:IEEE Press,2011:234-241. [41]SANO K,YAMAMOTO S,HATSUDA Y.Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation [J].ACM SIGARCH Computer Architecture News,2011,39(4):44-49. [42]SANO K,KONO F,NAKASATO N.Stream computation ofshallow water equation solver for FPGA-based 1D tsunami simu-lation[J].ACM SIGARCH Computer Architecture News,2015,43(4):82-87. [43]NAGASU K,SANO K,KONO F,et al.FPGA-based tsunamisimulation:Performance comparison with GPUs,and roofline model for scalability analysis [J].Journal of Parallel and Distributed Computing,2017,106:153-169. [44]WAIDYASOORIYA H M,TAKEI Y,TATSUMI S.OpenCL-based FPGA-platform for stencil computation and its optimization technology [J].IEEE Transactions on Parallel and Distri-buted Systems,2017,28(5):1390-1402. [45]ZOHOURI H R,PODOBAS A,MATSUOKA S.Combined spatial and temporal blocking for high-performance stencil computation on FPGA using OpenCL[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2018:153-162. [46]XIA F.Research on the hardware acceleration for biological sequence analysis [D].Changsha:National University of Defense Technology,2011.(in Chinese) 夏飞.生物序列分析算法硬件加速器关键技术研究[D].长沙:国防科学技术大学,2011. [47]RAMDAS T,EGAN G.A survey of FPGAs for acceleration of high performance computing and their application to computational molecular biology[C]∥Proceedings of the IEEE Region Ten Conference.Piscataway:IEEE Press.2005:1-6. [48]SETTLE S O.High-performance dynamic programming on FPGAs with OpenCL[C]∥Proceedings of the IEEE Conference on High Performance Extreme Computing.Piscataway:IEEE Press,2013:173-178. [49]TUCCI L D,BRIEN K,BLOTT M,et al.Architectural optimizations for high-performance and energy efficient Simit-Waterman implementation on FPGAs using OpenCL[C]∥Proceedings of the IEEE Conference on Design Automation and Test in Europe.Piscataway:IEEE Press,2017:716-721. [50]SIRASAO A,DELAYE E,SUNKAVALLI R,et al.FPGAbased OpenCL acceleration of genome sequencing software [R].San Jose:Xilinx Inc.2015. [51]RUCCI E,GARCIA C,BOTELLA G,et al.Accelerating Smith-Waterman alignment of long DNA sequencing with OpenCL on FPGA[C]∥Proceedings of the International Conference on Bioinformatics and Biomedical Engineering.Berlin:Springer,2017:500-511. [52]HOUTGAST E J,SIMA V M,ARS Z.High performancestreaming Smith-Waterman implementation with implicit synchronization on Intel FPGA using OpenCL[C]∥Proceedings of the IEEE Conference on Bioinformatics and Biomedical Engineering.Piscataway:IEEE Press,2018:492-496. [53]XIA F,ZOU D,LU L N,et al.FPGASW:Accelerating largescale Smith-Waterman sequence alignment application with backtracking on FPGA linear systolic array[J].InterdisciplinaryScience:Computational Life Science,2018,10(1):176-188. [54]CONG J,XIAO B J.Minimizing computation in convolutionalneural networks[C]∥Proceedings of International Conference on Artificial Neural Networks.Berlin:Springer,2014:281-290. [55]ZHANG C,LI P,SUN G Y,et al.Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2015:161-170. [56]PEEMEN M,SETIO A,MESMAN B,et al.Memory-centric accelerator for convolutional neural networks[C]∥Proceedings of IEEE Conference on Computer Aided Design.Piscataway:IEEE Press,2013:13-19. [57]SUDA N,CHANDRA V,DASIKA G,et al.Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2016:16-25. [58]ZHANG C,FANG Z M,ZHOU P P,et al.Caffeine:Towards uniformed representation and acceleration for deep convolutional neural networks[C]∥Proceedings of IEEE Conference on Computer Aided Design.Piscataway:IEEE Press,2016:79-86. [59]AYDONAT U,O’CONNELL S,CAPALIJA D,et al.An OpenCL deep learning accelerator on Arria 10[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2017:55-64. [60]LAVIN A,GRAY S.Fast algorithms for convolutional neural networks[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:4013-4021. [61]NURVITADHI E,VENKATESH G,SIM J,et al.Can FGPAs beat GPUs in accelerating next-generation deep neural networks?[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2017:5-14. [62]MOSS D,KRISHAN S,NURVITADHI E,et al.A customizable matrix multiplication framework for the Intel HARPv2 Xeon+FPGA platform[C]∥Proceedings of IEEE Conference on Field-Programmable Gate Arrays.Piscataway:IEEE Press,2018:107-116. [63]ZHENG F,LI H L,LV H,et al.Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture [J].Journal of Computer Science and Technology,2015,30(1):145-162. [64]LIN H.Extreme-scale graph analysis on heterogeneous architecture [D].Beijing:Tsinghua University,2017.(in Chinese) 林恒.基于超大规模异构体系结构的图计算系统研究 [D].北京:清华大学,2017. [65]AO Y L,YANG C,LIU F F,et al.Performance optimization of the HPCG benchmark on the Sunway TaihuLight supercomputer[J].ACM Transactions on Architecture and Code Optimization,2018,15(1):11-21. [66]AO Y L,YANG C,WANG X L,et al.26 PFLOPS stencil computation for atmospheric modeling on Sunway TaihuLight[C]∥Proceedings of IEEE International Parallel and Distributed Processing Symposium.Piscataway:IEEE Press,2017:535-544. [67]DUAN X H,XU K,CHAN Y D,et al.S-Aligner:Ultrascalable read mapping on Sunway Taihu Light[C]∥Proceedings of IEEE Conference on Cluster.Piscataway:IEEE Press,2017:36-46. [68]FANG J R,FU H H,ZHAO W L,et al.swDNN:A library foraccelerating deep learning applications on Sunway TaihuLight[C]∥Proceedings of IEEE International Parallel and Distributed Processing Symposium.Piscataway:IEEE Press,2017:615-624. |
[1] | YIN Hong-jun, DENG Nan, CHENG Ya-di. Teleoperation Method for Hexapod Robot Based on Acceleration Fuzzy Control [J]. Computer Science, 2022, 49(6A): 714-722. |
[2] | GAO Jie, LIU Sha, HUANG Ze-qiang, ZHENG Tian-yu, LIU Xin, QI Feng-bin. Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor [J]. Computer Science, 2022, 49(5): 355-362. |
[3] | CHEN Yong, XU Qi, WANG Xiao-ming, GAO Jin-yu, SHEN Rui-juan. Energy Efficient Power Allocation for MIMO-NOMA Communication Systems [J]. Computer Science, 2021, 48(6A): 398-403. |
[4] | WANG Deng-tian, ZHOU Hua, QIAN He-yue. LDPC Adaptive Minimum Sum Decoding Algorithm and Its FPGA Implementation [J]. Computer Science, 2021, 48(6A): 608-612. |
[5] | GUO Biao, TANG Qi, WEN Zhi-min, FU Juan, WANG Ling, WEI Ji-bo. List-based Software and Hardware Partitioning Algorithm for Dynamic Partial Reconfigurable System-on-Chip [J]. Computer Science, 2021, 48(6): 19-25. |
[6] | QI Yan-rong, ZHOU Xia-bing, LI Bin, ZHOU Qing-lei. FPGA-based CNN Image Recognition Acceleration and Optimization [J]. Computer Science, 2021, 48(4): 205-212. |
[7] | CHENG Yun-fei, TIAN Hong-xin, LIU Zu-jun. Collaborative Optimization of Joint User Association and Power Control in NOMA Heterogeneous Network [J]. Computer Science, 2021, 48(3): 269-274. |
[8] | CHEN Guo-liang, ZHANG Yu-jie, . Development of Parallel Computing Subject [J]. Computer Science, 2020, 47(8): 1-4. |
[9] | WANG Zhe, TANG Qi, WANG Ling, WEI Ji-bo. Joint Optimization Algorithm for Partition-Scheduling of Dynamic Partial Reconfigurable Systems Based on Simulated Annealing [J]. Computer Science, 2020, 47(8): 26-31. |
[10] | LI Yu-rong, LIU Jie, LIU Ya-lin, GONG Chun-ye, WANG Yong. Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation [J]. Computer Science, 2020, 47(8): 49-55. |
[11] | WANG Liang, ZHOU Xin-zhi, YNA Hua. Real-time SIFT Algorithm Based on GPU [J]. Computer Science, 2020, 47(8): 105-111. |
[12] | ZHANG Long-xin, ZHOU Li-qian, WEN Hong, XIAO Man-sheng, DENG Xiao-jun. Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems [J]. Computer Science, 2020, 47(8): 112-118. |
[13] | CHEN Li-feng, ZHU Lu-ping. Encrypted Dynamic Configuration Method of FPGA Based on Cloud [J]. Computer Science, 2020, 47(7): 278-281. |
[14] | ZHAO Bo, YANG Ming, TANG Zhi-wei and CAI Yu-xin. Intelligent Video Surveillance Systems Based on FPGA [J]. Computer Science, 2020, 47(6A): 609-611. |
[15] | CAI Yu-xin, TANG Zhi-wei, ZHAO Bo, YANG Ming and WU Yu-fei. Accelerated Software System Based on Embedded Multicore DSP [J]. Computer Science, 2020, 47(6A): 622-625. |
|