计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 591-596.

• 综合、交叉与应用 • 上一篇    下一篇

SOM算法在申威众核上的实现和优化

姚庆, 郑凯, 刘垚, 王肃, 孙军, 徐梦轩   

  1. 华东师范大学计算机科学与软件工程学院 上海200062;
    数学工程与先进计算国家重点实验室 江苏 无锡214215
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 通讯作者: 郑 凯(1968-),男,博士,副教授,主要研究方向为计算机网络、云计算,E-mail:kzheng@cs.ecnu.edu.cn
  • 作者简介:姚 庆(1992-),男,硕士生,主要研究方向为高性能计算;刘 垚(1981-),男,硕士,高级工程师,主要研究方向为计算机网络、高性能计算;王 肃(1980-),女,博士,讲师,主要研究方向为智能优化、信息系统;孙 军(1993-),男,硕士生,主要研究方向为高性能计算、机器学习;徐梦轩(1994-),男,硕士生,主要研究方向为高性能计算。
  • 基金资助:
    本文受数学工程与先进计算国家重点实验室开放基金项目(2016A05),华东师范大学实验教学设备研制基金项目(41000-10201-562930/004/006)资助。

Implementation and Optimization of SOM Algorithm on Sunway Many-core Processors

YAO Qing, ZHENG Kai, LIU Yao, WANG Su, SUN Jun, XU Meng-xuan   

  1. College of Computer Science and Software Engineering,East China Normal University,Shanghai 200062,China;
    State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi,Jiangsu 214215,China
  • Online:2019-02-26 Published:2019-02-26

摘要: 自组织神经网络(SOM)是一种被广泛使用的经典机器学习算法,但在处理复杂数据时其执行时间将急剧延长。并行化是解决这个问题的有效途径。基于目前TOP500上排名第一的“神威·太湖之光”超算平台,从模型并行和数据并行的角度出发,设计了SOM在申威众核处理器上的单核组和多核组的并行。一方面,通过程序重构将主要计算步骤转换为矩阵运算并利用高性能扩展数学库实现向量计算的并行化;另一方面,针对超算硬件的特性使用多种优化手段进行进一步的性能优化,使算法的性能得到了极大的提升。实验中,当使用64个核组时,所提算法的总加速比超过10000倍,同时最高可达900多倍的从核加速比也证明了所提算法有效发挥了申威核组中众核的能力。

关键词: Athread, MPI, 并行计算, 神威太湖之光, 自组织神经网络

Abstract: The self-organizing map(SOM) is a classical algorithm often used in machine learning,but the execution time of the algorithm increases sharply when dealing with complex data.The parallelization of SOM can solve this problem effectively.A parallel SOM algorithm was proposed based on the “Sunway TaihuLight” heterogeneous supercomputer ranked first in the latest TOP500 list,which is implemented on the single core group and the multi core groups in view of model parallelism and data parallelism.On the one hand,the main calculation steps of SOM are transformed into matrix operations through the program refactoring,and its parallelism is implemented by using the high performance extended math library.On the other hand,a variety of optimization methods especially based on supercomputing hardware are used to optimize the performance.By these methods,the performance of the algorithm is improved greatly.In the experiment,the maximum speedup ratio reaches over 10000 when using 64 core groups,and the CPEs speedup ratio can reach more than 900 at most which indicate that the designed algorithm can take full advantage of the power of “Sunway 26010” CPE.

Key words: Athread, MPI, Parallel computing, Self-organizing neural network, Sunway TaihuLight

中图分类号: 

  • TP311.52
[1]KOHONEN T.The self-organizing map[J].Neurocomputing,1990,21(1-3):1-6.
[2]YUAN J,KE-JIA C,ZHI-HUA Z.SOM based image-segmentation[J].Lecture Notes in Computer Science,2003,2639:640-643.
[3]KUMAR D,RAI C S,KUMAR S.Face Recognition using Self-organizing Map and Principal Component Analysis[C]∥International Conference on Neural Networks and Brain,2005.Icnn&b.IEEE,2005:1469-1473.
[4]JIN H,SHUM W H,LEUNG K S,et al.Expanding self-organizing map for data visualization and cluster analysis[J].Information Sciences,2004,163(1-3):157-173.
[5]LEUNG C S,CHAN L W.Transmission of vector quantized data over a noisy channel[M].IEEE Press,1997.
[6]OJA M,SPERBER G,BLOMBERG J,et al.Grouping and visua-lizing human endogenous retroviruses by bootstrapping median self-organizing maps[C]∥Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology,2004(CIBCB ’04).IEEE,2004:95-101.
[7]KOHONEN T,XING H.Contextually Self-Organized Maps of Chinese Words[M]∥Advances in Self-Organizing Maps.Springer Berlin Heidelberg,2010:16-29.
[8]SUL S J,TOVCHIGRECHKO A.Parallelizing BLAST and SOM Algorithms with MapReduce-MPI Library[C]∥IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.IEEE,2011:481-489.
[9]WITTEK P.Somoclu:An Efficient Distributed Library for Self-Organizing Maps[J].Journal of Statistical Software,2013,78(9):1-21.
[10]XIAO Y,FENG R B,HAN Z F,et al.GPU Accelerated Self-Organizing Map for High Dimensional Data[J].Neural Processing Letters,2015,41(3):341-355.
[11]TAKATSUKA M,BUI M.Parallel Batch Training of the Self-Organizing Map Using OpenCL[C]∥International Conference on Neural Information Processing:MODELS and Applications.Springer-Verlag,2010:470-476.
[12]WANG Y,LIN J,CAI L,et al.Portingand optimizing gtc-p on taihulight supercomputer with sunway openacc[C]∥HPC China.2016.
[13]KOHONEN T.Essentials of the self-organizing map[J].Neural Networks the Official Journal of the International Neural Network Society,2013,37(1):52-65.
[1] 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香.
面向国产异构众核架构的CFD非结构网格计算并行优化方法
Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture
计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157
[2] 刘江, 刘文博, 张矩.
OpenFoam中多面体网格生成的MPI+OpenMP混合并行方法
Hybrid MPI+OpenMP Parallel Method on Polyhedral Grid Generation in OpenFoam
计算机科学, 2022, 49(3): 3-10. https://doi.org/10.11896/jsjkx.210700060
[3] 傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文.
一种面向构件化并行应用程序的性能骨架分析方法
Performance Skeleton Analysis Method Towards Component-based Parallel Applications
计算机科学, 2021, 48(6): 1-9. https://doi.org/10.11896/jsjkx.201200115
[4] 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵.
基于神威平台的Floyd并行算法的实现和优化
Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform
计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051
[5] 冯凯, 马鑫玉.
(n,k)-冒泡排序网络的子网络可靠性
Subnetwork Reliability of (n,k)-bubble-sort Networks
计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139
[6] 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立.
基于GPU加速的并行WMD算法
Parallel WMD Algorithm Based on GPU Acceleration
计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213
[7] 谢景明, 胡伟方, 韩林, 赵荣彩, 荆丽娜.
基于“嵩山”超级计算机系统的量子傅里叶变换模拟
Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System
计算机科学, 2021, 48(12): 36-42. https://doi.org/10.11896/jsjkx.201200023
[8] 蒋化南, 张帅, 林宇斐, 李豪.
基于MPI的分布式并行Gazebo仿真优化与测试
Simulation Optimization and Testing Based on Gazebo of MPI Distributed Parallelism
计算机科学, 2021, 48(11A): 672-677. https://doi.org/10.11896/jsjkx.210100109
[9] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁.
显示导向型的大规模地理矢量实时可视化技术
Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data
计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121
[10] 陈国良, 张玉杰.
并行计算学科发展历程
Development of Parallel Computing Subject
计算机科学, 2020, 47(8): 1-4. https://doi.org/10.11896/jsjkx.200600027
[11] 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘.
异构混合并行计算综述
Survey of Heterogeneous Hybrid Parallel Computing
计算机科学, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045
[12] 郭杰, 高希然, 陈莉, 傅游, 刘颖.
用数据驱动的编程模型并行多重网格应用
Parallelizing Multigrid Application Using Data-driven Programming Model
计算机科学, 2020, 47(8): 32-40. https://doi.org/10.11896/jsjkx.200500093
[13] 刘晓楠, 荆丽娜, 王立新, 王美玲.
基于申威26010处理器的大规模量子傅里叶变换模拟
Large-scale Quantum Fourier Transform Simulation Based on SW26010
计算机科学, 2020, 47(8): 93-97. https://doi.org/10.11896/jsjkx.200300015
[14] 袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀.
面向国产异构众核处理器SW26010的BFS优化方法
Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010
计算机科学, 2020, 47(8): 98-104. https://doi.org/10.11896/jsjkx.191000013
[15] 冯凯, 李婧.
k元n方体的子网络可靠性研究
Study on Subnetwork Reliability of k-ary n-cubes
计算机科学, 2020, 47(7): 31-36. https://doi.org/10.11896/jsjkx.190700170
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!