Computer Science ›› 2021, Vol. 48 ›› Issue (12): 24-28.doi: 10.11896/jsjkx.210600213

• Computer Architecture • Previous Articles     Next Articles

Parallel WMD Algorithm Based on GPU Acceleration

HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li   

  1. College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China
  • Received:2021-06-25 Revised:2021-07-16 Online:2021-12-15 Published:2021-11-26
  • About author:HU Rong,born in 1997,postgraduate.Her main research interests include high-performance computing and so on.
    YANG Wang-dong,born in 1974,professor.His main research interests include high performance computing and parallel computing.
  • Supported by:
    National Key R & D Program of China(2018YFB0204302),Key Program of National Natural Science Foundation of China(92055213) and National Natural Science Foundation of China(61872127,61751204).

Abstract: Word Mover's Distance (WMD) is a method of measuring text similarity.It defines the difference between two texts as the minimum distance between the word embedding vectors of the text.WMD uses the vocabulary to represent the text as a normalized bag-of-words vector.Since the words of the text occupies a small proportion in the corpus,the document vector gene-rated by the bag-of-words model is very sparse.Multiple documents can form a high-dimensional sparse matrix,such a sparse matrix will generate a lot of unnecessary operations.By calculating the WMD of a single source document for multiple target documents at once,the calculation process can be highly parallelized.Aiming at the sparsity of text vectors,this paper proposes a GPU-based parallel Sinkhorn-WMD algorithm,which uses compressed format to store target text to improve memory utilization,and reduces intermediate calculations based on the sparse structure.The pre-trained word embedding vector is used to calculate the word distance matrix,the WMD algorithm is improved,and the optimization algorithm is verified on two public news data sets.The experimental results show that the parallel algorithm on NVIDIA TITAN RTX can achieve a speedup of up to 67.43× compared with the CPU serial algorithm.

Key words: GPU, Parallel computing, Sparse matrix multiplication, Text similarity, WMD

CLC Number: 

  • TP391
[1]BROKOS G I,MALAKASIOTIS P,ANDROUTSOPOULOS I.Using Centroids of Word Embeddings and Word Mover's Distance for Biomedical Document Retrieval in Question Answering[J].arXiv:1608.03905,2016.
[2]LU C,ARONSON A,SHOOSHAN S,et al.Spell checker for consumer language (CSpell)[J].Journal of the American Medical Informatics Association,2019,26:211-218.
[3]ALZAHRANI S,SALIM N.Fuzzy semantic-based string similarity for extrinsic plagiarism detection[J].Braschler and Harman,2010,1176:1-8.
[4]KUSNER M,SUN Y,KOLKIN N,et al.From word embeddings to document distances[C]//International Conference on Machine Learning.PMLR,2015:957-966.
[5]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119.
[6]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[7]LIU B.Text sentiment analysis based on CBOW model and deep learning in big data environment[J].Journal of Ambient Intelligence and Humanized Computing,2020,11(2):451-458.
[8]SHAZEER N,PELEMANS J,CHELBA C.Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation[J].arXiv:1412.1454,2015.
[9]TITHI J J,PETRINI F.An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance[J].arXiv:2005.06727,2021.
[10]RUBNER Y,TOMASI C,GUIBAS L J.The Earth Mover's Distance as a Metric for Image Retrieval[J].International Journal of Computer Vision,2000,40(2):99-121.
[11]PEYRÉ G,CUTURI M.Computational optimal transport:With applications to data science[J].Foundations and Trends© in Machine Learning,2019,11(5/6):355-607.
[12]ALVAREZ-MELIS D,JEGELKA S,JAAKKOLA T S.To- wards optimal transport with global invariances[C]//22nd International Conference on Artificial Intelligence and Statistics.PMLR,2019:1870-1879.
[13]CORENFLOS A,THORNTON J,DOUCET A,et al.Differen- tiable Particle Filtering via Entropy-Regularized Optimal Transport[J].arXiv:2102.07850,2021.
[14]FEYDY J,ROUSSILLON P,TROUVÉ A,et al.Fast and scalable optimal transport for brain tractograms[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2019:636-644.
[15]CUTURI M.Sinkhorn Distances:Lightspeed Computation of Optimal Transportation Distances[J].arXiv:1306.0895,2013.
[16]KNIGHT P A.The Sinkhorn-Knopp Algorithm:Convergence and Applications[J].SIAM Journal on Matrix Analysis and Applications,2008,30(1):261-275.
[17]CUDA C++ Best Practices Guide[EB/OL].(2021-07-13). http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html.
[1] GUO Zheng-wei, FU Ze-wen, LI Ning, BAI Lan. Study on Acceleration Algorithm for Raw Data Simulation of High Resolution Squint Spotlight SAR [J]. Computer Science, 2022, 49(8): 178-183.
[2] ZONG Di-di, XIE Yi-wu. Model Medial Axis Generation Method Based on Normal Iteration [J]. Computer Science, 2022, 49(6A): 764-770.
[3] CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107.
[4] WANG Jin, LIU Jiang. GPU-based Parallel DILU Preconditioning Technique [J]. Computer Science, 2022, 49(6): 108-118.
[5] FU Tian-hao, TIAN Hong-yun, JIN Yu-yang, YANG Zhang, ZHAI Ji-dong, WU Lin-ping, XU Xiao-wen. Performance Skeleton Analysis Method Towards Component-based Parallel Applications [J]. Computer Science, 2021, 48(6): 1-9.
[6] HE Ya-ru, PANG Jian-min, XU Jin-long, ZHU Yu, TAO Xiao-han. Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform [J]. Computer Science, 2021, 48(6): 34-40.
[7] LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204.
[8] MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122.
[9] CHEN Guo-liang, ZHANG Yu-jie, . Development of Parallel Computing Subject [J]. Computer Science, 2020, 47(8): 1-4.
[10] YANG Wang-dong, WANG Hao-tian, ZHANG Yu-feng, LIN Sheng-le, CAI Qin-yun. Survey of Heterogeneous Hybrid Parallel Computing [J]. Computer Science, 2020, 47(8): 5-16.
[11] LIU Shi-fang, ZHAO Yong-hua, YU Tian-yu, HUANG Rong-feng. Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster [J]. Computer Science, 2020, 47(4): 6-12.
[12] ZUO Xian-yu, ZHANG Zhe, SU Yue-han, LIU Yang, GE Qiang, TIAN Jun-feng. Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model [J]. Computer Science, 2020, 47(4): 25-29.
[13] YANG Zong-lin, LI Tian-rui, LIU Sheng-jiu, YIN Cheng-feng, JIA Zhen, ZHU Jie. Streaming Parallel Text Proofreading Based on Spark Streaming [J]. Computer Science, 2020, 47(4): 36-41.
[14] JIA Jing-dong, ZHANG Xiao-man, HAO Lu, TAN Huo-bin. Analysis of Focuses of Requirements Engineering in Industry [J]. Computer Science, 2020, 47(12): 25-34.
[15] DENG Ding-sheng. Application of Improved DBSCAN Algorithm on Spark Platform [J]. Computer Science, 2020, 47(11A): 425-429.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!