计算机科学 ›› 2017, Vol. 44 ›› Issue (8): 36-41.doi: 10.11896/j.issn.1002-137X.2017.08.007

• 高性能计算 • 上一篇    下一篇

基于MPSoC并行调度的矩阵乘法加速算法研究

杨飞,马昱春,侯金,徐宁   

  1. 中南民族大学智能无线通信湖北省重点实验室 武汉430074;清华大学计算机科学与技术系 北京100084,清华大学计算机科学与技术系 北京100084,中南民族大学智能无线通信湖北省重点实验室 武汉430074,武汉理工大学交通物联网技术湖北省重点实验室 武汉430074
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受European Union Seventh Framework Programme(318521),国家自然科学基金面上项目(61076035)资助

Research on Acceleration of Matrix Multiplication Based on Parallel Scheduling on MPSoC

YANG Fei, MA Yu-chun, HOU Jin and XU Ning   

  • Online:2018-11-13 Published:2018-11-13

摘要: 矩阵乘法是数值分析以及图形图像处理算法的基础,通用的矩阵乘法加速器设计一直是嵌入式系统设计的研究热点。但矩阵乘法由于计算复杂度高,处理效率低,常常成为嵌入式系统运算速度的瓶颈。为了在嵌入式领域更好地使用矩阵乘法,提出了基于MPSoC(MultiProcessor System-on-Chip)的软硬件协同加速的架构。在MPSoC的架构下,一方面,设计了面向硬件约束的矩阵分块方法,从而实现了通用的矩阵乘法加速器系统;另一方面,通过利用MPSoC下的多核架构,提出了相应的任务划分和负载平衡调度算法,提高了并行效率和整体系统加速比。实验结果表明,所提架构及算法实现了通用的矩阵乘法计算,并且通过软硬件协同设计实现的多核并行调度算法与传统单核设计相比在计算效率方面得到了显著的提高。

关键词: 矩阵乘法,MPSoC,并行计算,负载平衡

Abstract: Matrix multiplication is the basic algorithm of the numerical analysis,graphics and image processing.General matrix multiplication accelerator has always been a research focus in the embedded system design.However,due to the high complexity and the low processing efficiency,matrix multiplication becomes the bottleneck of computation speed of embedded systems.In order to use matrix multiplication in the embedded field,a synergy acceleration architecture of software and hardware based on MPSoC was proposed in this paper.With MPSoC architecture,the partitioning of the matrix considering hardware constraints is implemented in our HW/SW system to enable the computation of general matrix multiplications.The parallel computation with multiple cores and hardware function unit is realized with the load balance algorithms.Parallel efficiency and speed-up ratio are improved.The experimental results show that the proposed general matrix multiplication approach can achieve significant speed-up over the traditional approaches with single core.

Key words: Matrix multiplication,MPSoC,Parallel computation,Load balance

[1] LEGALL F.Faster algorithms for rectangular matrix multiplication[C]∥Proceedings of Annual Symposium on Foundations of Computer Science (FOCS).Los Alamitos:IEEE Computer Society Press,2012:514-523.
[2] BENNER P,REMON A,DUFRECHOU E,et al.Acceleratingthe general band matrix multiplication using graphics processors[C]∥Proceedings of 2014 XL Latin American Computing Conference (CLEI).Los Alamitos:IEEE Computer Society Press,2014:1-7.
[3] WU Z C,MAO C,HAN L,et al.Highly Scalable Sparse Matrix Multiplication[J].Journal of Frontiers of Computer Science and Technology,2013,7(11):973-982.(in Chinese) 吴志川,毛琛,韩蕾,等.高度可伸缩的稀疏矩阵乘法[J].计算机科学与探索,2013,7(11):973-982.
[4] LIU S P,JIANG X Y,XIAO P,et al.An Efficient Sparse Matrix Multiplier Based on FPGA[J].Microelectronies,2013,43(2):153-157.(in Chinese) 刘世培,江先阳,肖鹏,等.一种基于 FPGA 的稀疏矩阵高效乘法器[J].微电子学,2013,43(2):153-157.
[5] MATAM K K,PRASANNA V K.Energy-efficient large-scale matrix multiplication on FPGAs[C]∥Proceedings of 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).Los Alamitos:IEEE Computer Society Press,2013:1-8.
[6] BEAUMONT O,EYRAULD-DUBOIS L,G UERMOUCHE A,et al.Comparison of Static and Runtime Resource Allocation Strategies for Matrix Multiplication[C]∥Proceedings of 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).Los Alamitos:IEEE Computer Society Press,2015:170-177.
[7] SALIM M,AKKIRMAN A O,HIDAYETOGLU M,et al.Comparative benchmarking:matrix multiplication on a multicore coprocessor and a GPU[C]∥Computational Electromagnetics International Workshop (CEM).2015.IEEE,2015:1-2.
[8] KUMAR V,KUMAR V B Y,SACHIN B P.FPGA based Implementation of M4RM for Matrix Multiplication over GF(2)[C]∥Proceedings of 18th International Symposium on VLSI Design and Test.Los Alamitos:IEEE Computer Society Press,2014:1-2.
[9] DAMMAK B,BENMANSOUR R,NIAR S,et al.A mixed integer linear programming approach for design space exploration in FPGA-based MPSoC[C]∥Proceedings of 2014 24th International Conference on Field Programmable Logic and Applications (FPL).Los Alamitos:IEEE Computer Society Press,2014:1-4.
[10] BENNOUR I,SEBAI D,JEMAI A.Modeling sw to hw task migration for MPSOC performance analysis[C]∥Proceedings of 2010 5th International Conference on Design and Technology of Integrated Systems in Nanoscale Era (DTIS).Los Alamitos:IEEE Computer Society Press,2010:1-6.
[11] ZHAO P,SHEN B L,WANG D W,et al.Research on Task Allocation of Multimedia Application onto Heterogeneous Multi-processor System-on-Chip[J].Journal of Computer-Aided Design & Computer Graphics,2010,22(10):1671-1678.(in Chinese) 赵鹏,沈弼龙,王大伟,等.面向异构 MPSoC 的多媒体应用程序任务分配方法[J].计算机辅助设计与图形学学报,2010,22(10):1671-1678.
[12] LI D,HOU Y B,HUANG Z Q,et al.A Fuzzy Dynamic Scheduling Algorithm for Multiple Characteristics of MPSoC System[J].Journal of Computer-Aided Design & Computer Graphics,2011,23(8):1447-1454.(in Chinese) 李达,侯义斌,黄樟钦,等.面向MPSoC系统多特征的模糊动态调度算法[J].计算机辅助设计与图形学学报,2011,23(8):1447-1454.
[13] KARIM M,AMAROUCH M Y.An FPGA-based MPSoC forreal-time ECG analysis[C]∥Proceedings of 2015 Third World Conference on Complex Systems (WCCS).Los Alamitos:IEEE Computer Society Press,2015:1-4.
[14] ZHANG C,MA Y,LUK W.HW/SW Partitioning Algorithm Targeting MPSOC with Dynamic Partial Reconfigurable Fabric[C]∥Proceedings of 2015 14th International Conference on Computer-Aided Design and Computer Graphics (CAD/Graphics).Los Alamitos:IEEE Computer Society Press,2015:240-241.
[15] WICAKSANA A,TANG C M,NG M S.A scalable and configurable Multiprocessor System-on-Chip (MPSoC) virtual platform for hardware and software co-design and co-verification[C]∥Proceeding of 2015 3rd International Conference on New Media (CONMEDIA).Los Alamitos:IEEE Computer Society Press,2015:1-7.
[16] Love R.Linux内核设计与实现(第3版)[M].陈莉君,康华,张波,译.北京:机械工业出版社,2011.
[17] 博韦,切萨蒂.深入理解LINUX 内核[M].陈莉君,张琼声,张宏伟,译.北京:中国电力出版社,2007.
[18] AAS J.Understanding the Linux 2.6.8.1 CPU scheduler.http://core.ac.uk/display/24603273.
[19] SAKAMURA K.μ ITRON4.0 Specification:Ver.4.00.00[R].TRON Association,2002.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!