基于GPU加速的并行WMD算法

doi:10.11896/jsjkx.210600213

计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 24-28.doi: 10.11896/jsjkx.210600213

基于GPU加速的并行WMD算法

胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立

湖南大学信息科学与工程学院长沙410082

收稿日期:2021-06-25 修回日期:2021-07-16 出版日期:2021-12-15 发布日期:2021-11-26
通讯作者: 阳王东(yangwangdong@hnu.edu.cn)
作者简介:upupwords@hnu.edu.cn
基金资助:
国家重点研发计划课题(2018YFB0204302);国家自然科学基金重点项目(92055213);国家自然科学基金项目(61872127,61751204)

Parallel WMD Algorithm Based on GPU Acceleration

HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li

College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China

Received:2021-06-25 Revised:2021-07-16 Online:2021-12-15 Published:2021-11-26
About author:HU Rong,born in 1997,postgraduate.Her main research interests include high-performance computing and so on.
YANG Wang-dong,born in 1974,professor.His main research interests include high performance computing and parallel computing.
Supported by:
National Key R & D Program of China(2018YFB0204302),Key Program of National Natural Science Foundation of China(92055213) and National Natural Science Foundation of China(61872127,61751204).

摘要/Abstract

摘要： Word Mover's Distance(WMD)是一种度量文本相似度的方法,它将两个文本之间的差异定义为文本的词嵌入向量之间的最小距离。WMD利用词汇表,将文本表示为归一化的词袋向量。文本的单词在语料中所占的比例很小,因此用词袋模型生成的文本向量很稀疏。多个文本可以组成一个高维的稀疏矩阵,这样的稀疏矩阵会生成大量不必要的运算。通过一次性对多个目标文本计算单个源文本的WMD,可以使计算过程高度并行化。针对文本向量的稀疏性,文中提出了一种基于GPU的并行Sinkhorn-WMD算法,采取压缩格式存储目标文本的方式来提高内存利用率,根据稀疏结构减少中间过程的计算。利用预训练词嵌入向量计算单词距离矩阵,对WMD算法进行改进,在两个公开的新闻数据集上进行优化算法的验证。实验结果表明,在NVIDIA TITAN RTX上并行算法与CPU串行相比最高可以达到67.43倍的加速。

关键词: GPU, WMD, 并行计算, 文本相似度, 稀疏矩阵乘法

Abstract: Word Mover's Distance (WMD) is a method of measuring text similarity.It defines the difference between two texts as the minimum distance between the word embedding vectors of the text.WMD uses the vocabulary to represent the text as a normalized bag-of-words vector.Since the words of the text occupies a small proportion in the corpus,the document vector gene-rated by the bag-of-words model is very sparse.Multiple documents can form a high-dimensional sparse matrix,such a sparse matrix will generate a lot of unnecessary operations.By calculating the WMD of a single source document for multiple target documents at once,the calculation process can be highly parallelized.Aiming at the sparsity of text vectors,this paper proposes a GPU-based parallel Sinkhorn-WMD algorithm,which uses compressed format to store target text to improve memory utilization,and reduces intermediate calculations based on the sparse structure.The pre-trained word embedding vector is used to calculate the word distance matrix,the WMD algorithm is improved,and the optimization algorithm is verified on two public news data sets.The experimental results show that the parallel algorithm on NVIDIA TITAN RTX can achieve a speedup of up to 67.43× compared with the CPU serial algorithm.

Key words: GPU, Parallel computing, Sparse matrix multiplication, Text similarity, WMD

中图分类号:

TP391

胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立. 基于GPU加速的并行WMD算法[J]. 计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213

HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration[J]. Computer Science, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213

参考文献

[1]BROKOS G I,MALAKASIOTIS P,ANDROUTSOPOULOS I.Using Centroids of Word Embeddings and Word Mover's Distance for Biomedical Document Retrieval in Question Answering[J].arXiv:1608.03905,2016.
[2]LU C,ARONSON A,SHOOSHAN S,et al.Spell checker for consumer language (CSpell)[J].Journal of the American Medical Informatics Association,2019,26:211-218.
[3]ALZAHRANI S,SALIM N.Fuzzy semantic-based string similarity for extrinsic plagiarism detection[J].Braschler and Harman,2010,1176:1-8.
[4]KUSNER M,SUN Y,KOLKIN N,et al.From word embeddings to document distances[C]//International Conference on Machine Learning.PMLR,2015:957-966.
[5]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119.
[6]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[7]LIU B.Text sentiment analysis based on CBOW model and deep learning in big data environment[J].Journal of Ambient Intelligence and Humanized Computing,2020,11(2):451-458.
[8]SHAZEER N,PELEMANS J,CHELBA C.Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation[J].arXiv:1412.1454,2015.
[9]TITHI J J,PETRINI F.An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance[J].arXiv:2005.06727,2021.
[10]RUBNER Y,TOMASI C,GUIBAS L J.The Earth Mover's Distance as a Metric for Image Retrieval[J].International Journal of Computer Vision,2000,40(2):99-121.
[11]PEYRÉ G,CUTURI M.Computational optimal transport:With applications to data science[J].Foundations and Trends© in Machine Learning,2019,11(5/6):355-607.
[12]ALVAREZ-MELIS D,JEGELKA S,JAAKKOLA T S.To- wards optimal transport with global invariances[C]//22nd International Conference on Artificial Intelligence and Statistics.PMLR,2019:1870-1879.
[13]CORENFLOS A,THORNTON J,DOUCET A,et al.Differen- tiable Particle Filtering via Entropy-Regularized Optimal Transport[J].arXiv:2102.07850,2021.
[14]FEYDY J,ROUSSILLON P,TROUVÉ A,et al.Fast and scalable optimal transport for brain tractograms[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2019:636-644.
[15]CUTURI M.Sinkhorn Distances:Lightspeed Computation of Optimal Transportation Distances[J].arXiv:1306.0895,2013.
[16]KNIGHT P A.The Sinkhorn-Knopp Algorithm:Convergence and Applications[J].SIAM Journal on Matrix Analysis and Applications,2008,30(1):261-275.
[17]CUDA C++ Best Practices Guide[EB/OL].(2021-07-13). http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html.

相关文章 15

[1]	宗迪迪, 谢益武. 基于法线迭代的模型中轴生成方法 Model Medial Axis Generation Method Based on Normal Iteration 计算机科学, 2022, 49(6A): 764-770. https://doi.org/10.11896/jsjkx.210400050
[2]	陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157
[3]	汪晋, 刘江. 基于GPU的并行DILU预处理技术 GPU-based Parallel DILU Preconditioning Technique 计算机科学, 2022, 49(6): 108-118. https://doi.org/10.11896/jsjkx.210300259
[4]	傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文. 一种面向构件化并行应用程序的性能骨架分析方法 Performance Skeleton Analysis Method Towards Component-based Parallel Applications 计算机科学, 2021, 48(6): 1-9. https://doi.org/10.11896/jsjkx.201200115
[5]	何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化 Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform 计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051
[6]	李繁, 严星, 张晓宇. 基于GPU的特征脸算法优化研究 Optimization of GPU-based Eigenface Algorithm 计算机科学, 2021, 48(4): 197-204. https://doi.org/10.11896/jsjkx.200600033
[7]	冯凯, 马鑫玉. (n,k)-冒泡排序网络的子网络可靠性 Subnetwork Reliability of (n,k)-bubble-sort Networks 计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139
[8]	马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术 Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data 计算机科学, 2020, 47(9): 117-122. https://doi.org/10.11896/jsjkx.190800121
[9]	陈国良, 张玉杰. 并行计算学科发展历程 Development of Parallel Computing Subject 计算机科学, 2020, 47(8): 1-4. https://doi.org/10.11896/jsjkx.200600027
[10]	阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘. 异构混合并行计算综述 Survey of Heterogeneous Hybrid Parallel Computing 计算机科学, 2020, 47(8): 5-16. https://doi.org/10.11896/jsjkx.200600045
[11]	冯凯, 李婧. k元n方体的子网络可靠性研究 Study on Subnetwork Reliability of k-ary n-cubes 计算机科学, 2020, 47(7): 31-36. https://doi.org/10.11896/jsjkx.190700170
[12]	刘世芳, 赵永华, 于天禹, 黄荣锋. 广义稠密对称特征问题标准化算法在GPU集群上的有效实现 Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster 计算机科学, 2020, 47(4): 6-12. https://doi.org/10.11896/jsjkx.191000009
[13]	左宪禹, 张哲, 苏岳瀚, 刘扬, 葛强, 田军锋. 基于GPU多流并发并行模型的NDVI提取算法 Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model 计算机科学, 2020, 47(4): 25-29. https://doi.org/10.11896/jsjkx.190500029
[14]	杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰. 基于Spark Streaming的流式并行文本校对 Streaming Parallel Text Proofreading Based on Spark Streaming 计算机科学, 2020, 47(4): 36-41. https://doi.org/10.11896/jsjkx.190300070
[15]	贾经冬, 张筱曼, 郝璐, 谭火彬. 工业界需求工程关注点分析 Analysis of Focuses of Requirements Engineering in Industry 计算机科学, 2020, 47(12): 25-34. https://doi.org/10.11896/jsjkx.201200048

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于GPU加速的并行WMD算法

Parallel WMD Algorithm Based on GPU Acceleration

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0