Computer Science ›› 2020, Vol. 47 ›› Issue (8): 49-55.doi: 10.11896/jsjkx.190900202

;

Previous Articles     Next Articles

Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation

LI Yu-rong1, LIU Jie1, 2, LIU Ya-lin1, GONG Chun-ye1, WANG Yong1   

  1. 1 Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, China
    2 Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Changsha 410073, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:LI Yu-rong, born in 1994, postgra-duate.Her main research interests include DTNMF parallel methods and high performance computing.
    LIU Jie, born in 1969, professor, Ph.D supervisor.His main research interests include parallel algorithm and high performance computing.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China(2018YFB0204301).

Abstract: Non-negative matrix factorization can preserve the non-negative features of the speech signal.It is an important method for speech separation.However, this method has the problems of complicated data operation and computation, it is necessary to propose a parallel method to reduce computation time.Aiming at the calculation problem of speech separation pre-training and separation process, this paper proposes a deep transductive non-negative matrix factorization multi-level parallel algorithm, which considers the data correlation of the iterative update process, and designs a multi-level parallel algorithm between tasks and within tasks.The parallel algorithm at the task level decomposes the training speech to obtain the corresponding base matrix as two independent tasks in parallel calculation.The matrix is divided into rows and columns at the internal process level of the task.The master process distributes the matrix blocks to the slave process, and the slave process receives the current sub-matrix, then the matrix block calculates the result matrix sub-block, and then sends the current process matrix block to the next process to achieve the traversal of each matrix block of the second matrix in all processes, calculating the product of the corresponding sub-block of the result matrix, and finally the sub-block is collected by the main process from the slave process.During the thread-level sub-matrix multiplication operation, an acceleration strategy of generating multiple threads and exchanging data through shared me-mory for sub-matrix block calculation is adopted.This algorithm is the first one to implement deep transduction non-negative matrix factorization algorithm.The experimentis performed on the Tianhe II platform.The test results show that when separating multi-speaker mixed speech signals without changing the separation effect, in the pre-training process, the proposed parallel algorithm performs well using 64 processes at a speed ratio of 18, and in the separation process, the corresponding speedup is 24.Compared to serial and MPI model separation, hybrid model separation time is greatly shortened, which proves that for the speech separation process, the parallel algorithm proposed in this paper can effectively improve the separation efficiency.

Key words: Acceleration algorithm based on multiplicative update rules, MPI, OpenMP, Parallel algorithm of deep-transductive non-negative matrix factorization, Speech separation

CLC Number: 

  • TP391
[1] LEE D D, SEUNG H S.Algorithms for non-negative matrix factorization[C]∥NIPS.2001:556-562.
[2] LEE D D, SEUNG H S.Learning the parts of objects by nonnegative matrix factorization[C]∥Nature.1999, 401:788-791.
[3] ANDRZEJ C, HUY P A, RAFAL Z, et al.Nonnegative matrix and tensor factorizations applications to exploratory multi-way data analysis and blind source separation[M].Wiley Publishing, 2009.
[4] GEMULLA R, NIJKAMP E, HAAS P J, et al.Large-scale matrix factorization with distributed stochastic gradient descent[C]∥Proceedings of the KDD.ACM, 2011:69-77.
[5] KIM J, PARK H.Fast Nonnegative Matrix Factorization:AnActive-Set-Like Method and Comparisons[J].SIAM Journal on Scientific Computing, 2011, 33(6):3261-3281.
[6] DONG C, ZHAO H, WANG W.Parallel Nonnegative MatrixFactorization Algorithm on the Distributed Memory Platform[J].International Journal of Parallel Programming, 2010, 38(2):117-137.
[7] LIU C, YANG H C, FAN J, et al.Distributed nonnegative matrix factorization for web-scale dyadic data analysis on map-reduce[C]∥Proceedings of the 19th International Conference on World Wide Web.ACM, 2010:681-690.
[8] KANJANI K.Parallel Non Negative Matrix Factorization fordocument clustering[Z].CPSC-659 Spring 2007 Course Project:Texas A&M University, 2007.
[9] LOPES N, RIBEIRO B.Non-negative Matrix Factorization.Implementation using Graphics Processing Units[C]∥Internatio-nal Conference on Intelligent Data Engineering & Automated Learning.Springer-Verlag, 2010.
[10]KANNAN R, BALLARD G, PARK H.HPC-NMF:A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization [J].arXiv:1509.09313.
[11]ROBILA S A, MACIAK L G.A parallel unmixing algorithm for
hyperspectral images[C]∥Optics East.International Society for Optics and Photonics, 2006.
[12]MOON G E, SUKUMARAN-RAJAM A, PARTHASARATHY S, et al.PL-NMF:Parallel Locality-Optimized Non-negative Matrix Factorization[J].arXiv:1904.07935, 2109.
[13]MEJA-ROA E, TABAS-MADRID D, SETOAIN J, et al.NMF-mGPU:non-negativematrix factorization nonmulti-GPU systems[J].BMC Bioinfor-matics, 2015, 16(1):43.
[14]LIU Y L.Research on Key Technologies of Speech Separation and Speech Recognition [D].Changsha:National University of Defense Technology, 2018.
[15]CHEN X H, XIE P Z, CHI LH, et al.An efficient SIMD compression format for sparse matrix-vector multiplication[J].Concurrency Computat Pract Exper, 2018, 30:e4800.
[16]MOHAMMADIHA N, SMARAGDIS P, LEIJON A.Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization[J].IEEE Transactions on Audio Speech & Language Processing, 2013, 21(10):2140-2151.
[17]GUAN N, LAN L, TAO D, et al.Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing.IEEE, 2014:2534-2538.
[18]LIU Y, GUAN N, LIU J.Deep Transductive Nonnegative Matrix Factorization for Speech Separation[C]∥2017 16th IEEE International Conference on Machine Learning and Applications(ICMLA).IEEE, 2017.
[19]ZHAO Y H, CHI X B.MPI+OpenMP hybrid programmingmodel based on SMP cluster and effective implementation [J].Microelectronics and Computer, 2005, 22(10):7-11.
[20]GU K H, HUANG M, HE J Y.Research on MPI+OpenMP hybrid programming model based on multi-core cluster[J].Gansu Techenology, 2018, 34(19):10-14.
[21]XU C, DENG X, ZHANG L, et al.Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer[J].Journal of Computational Physics, 2014, 278:275-297.
[22]Intel Math Kernel Library[EB/OL]https://software.intel.com /en-us/mkl-developer-reference-c-blas-level-3-routines.
[1] LU Hao-song, HU Yong-hua, WANG Shu-ying, ZHOU Xin-lian, LI Hui-xiang. Study on Hybrid Resource Heuristic Loop Unrolling Factor Selection Method Based on Vector DSP [J]. Computer Science, 2022, 49(6A): 777-783.
[2] LIU Jiang, LIU Wen-bo, ZHANG Ju. Hybrid MPI+OpenMP Parallel Method on Polyhedral Grid Generation in OpenFoam [J]. Computer Science, 2022, 49(3): 3-10.
[3] CHI Hao-yu, CHEN Chang-bo. Survey on Automatic Tuning of Compilers by Machine Learning [J]. Computer Science, 2022, 49(1): 241-251.
[4] GAO Shi-shun, ZHAO Hai-tao, ZHANG Xiao-ying, WEI Ji-bo. Self-adaptive Intelligent Wireless Propagation Model to Different Scenarios [J]. Computer Science, 2021, 48(7): 324-332.
[5] TANG Zhen, HU Yong-hua, LU Hao-song, WANG Shu-ying. Research on DSP Register Pairs Allocation Algorithm with Weak Assigning Constraints [J]. Computer Science, 2021, 48(6A): 587-595.
[6] XIE Jing-ming, HU Wei-fang, HAN Lin, ZHAO Rong-cai, JING Li-na. Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System [J]. Computer Science, 2021, 48(12): 36-42.
[7] HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[8] JIANG Hua-nan, ZHANG Shuai, LIN Yu-fei, LI Hao. Simulation Optimization and Testing Based on Gazebo of MPI Distributed Parallelism [J]. Computer Science, 2021, 48(11A): 672-677.
[9] ZHANG Ning, FANG Jing-wen, ZHAO Yu-xuan. Bitcoin Price Forecast Based on Mixed LSTM Model [J]. Computer Science, 2021, 48(11A): 39-45.
[10] HU Teng, WANG Yan-ping, ZHANG Xiao-song, NIU Wei-na. Data and Behavior Analysis of Blockchain-based DApp [J]. Computer Science, 2021, 48(11): 116-123.
[11] HAN Lei, HU Jian-peng. Deduplication Algorithm of Abstract Syntax Tree in GCC Based on Trie Tree of Keywords [J]. Computer Science, 2020, 47(9): 47-51.
[12] YANG Ping, WANG Sheng-yuan. Analysis of Target Code Generation Mechanism of CompCert Compiler [J]. Computer Science, 2020, 47(9): 17-23.
[13] GUO Jie, GAO Xi-ran, CHEN Li, FU You, LIU Ying. Parallelizing Multigrid Application Using Data-driven Programming Model [J]. Computer Science, 2020, 47(8): 32-40.
[14] CHI Hao-yu, CHEN Chang-bo. Prediction of Loop Tiling Size Based on Neural Network [J]. Computer Science, 2020, 47(8): 62-70.
[15] LIU Xiao-nan, JING Li-na, WANG Li-xin, WANG Mei-ling. Large-scale Quantum Fourier Transform Simulation Based on SW26010 [J]. Computer Science, 2020, 47(8): 93-97.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!