计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 24-28.doi: 10.11896/jsjkx.210600213

• 计算机体系结构* 上一篇    下一篇

基于GPU加速的并行WMD算法

胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立   

  1. 湖南大学信息科学与工程学院 长沙410082
  • 收稿日期:2021-06-25 修回日期:2021-07-16 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 阳王东(yangwangdong@hnu.edu.cn)
  • 作者简介:upupwords@hnu.edu.cn
  • 基金资助:
    国家重点研发计划课题(2018YFB0204302);国家自然科学基金重点项目(92055213);国家自然科学基金项目(61872127,61751204)

Parallel WMD Algorithm Based on GPU Acceleration

HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li   

  1. College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China
  • Received:2021-06-25 Revised:2021-07-16 Online:2021-12-15 Published:2021-11-26
  • About author:HU Rong,born in 1997,postgraduate.Her main research interests include high-performance computing and so on.
    YANG Wang-dong,born in 1974,professor.His main research interests include high performance computing and parallel computing.
  • Supported by:
    National Key R & D Program of China(2018YFB0204302),Key Program of National Natural Science Foundation of China(92055213) and National Natural Science Foundation of China(61872127,61751204).

摘要: Word Mover's Distance(WMD)是一种度量文本相似度的方法,它将两个文本之间的差异定义为文本的词嵌入向量之间的最小距离。WMD利用词汇表,将文本表示为归一化的词袋向量。文本的单词在语料中所占的比例很小,因此用词袋模型生成的文本向量很稀疏。多个文本可以组成一个高维的稀疏矩阵,这样的稀疏矩阵会生成大量不必要的运算。通过一次性对多个目标文本计算单个源文本的WMD,可以使计算过程高度并行化。针对文本向量的稀疏性,文中提出了一种基于GPU的并行Sinkhorn-WMD算法,采取压缩格式存储目标文本的方式来提高内存利用率,根据稀疏结构减少中间过程的计算。利用预训练词嵌入向量计算单词距离矩阵,对WMD算法进行改进,在两个公开的新闻数据集上进行优化算法的验证。实验结果表明,在NVIDIA TITAN RTX上并行算法与CPU串行相比最高可以达到67.43倍的加速。

关键词: 文本相似度, WMD, 并行计算, GPU, 稀疏矩阵乘法

Abstract: Word Mover's Distance (WMD) is a method of measuring text similarity.It defines the difference between two texts as the minimum distance between the word embedding vectors of the text.WMD uses the vocabulary to represent the text as a normalized bag-of-words vector.Since the words of the text occupies a small proportion in the corpus,the document vector gene-rated by the bag-of-words model is very sparse.Multiple documents can form a high-dimensional sparse matrix,such a sparse matrix will generate a lot of unnecessary operations.By calculating the WMD of a single source document for multiple target documents at once,the calculation process can be highly parallelized.Aiming at the sparsity of text vectors,this paper proposes a GPU-based parallel Sinkhorn-WMD algorithm,which uses compressed format to store target text to improve memory utilization,and reduces intermediate calculations based on the sparse structure.The pre-trained word embedding vector is used to calculate the word distance matrix,the WMD algorithm is improved,and the optimization algorithm is verified on two public news data sets.The experimental results show that the parallel algorithm on NVIDIA TITAN RTX can achieve a speedup of up to 67.43× compared with the CPU serial algorithm.

Key words: Text similarity, WMD, Parallel computing, GPU, Sparse matrix multiplication

中图分类号: 

  • TP391
[1]BROKOS G I,MALAKASIOTIS P,ANDROUTSOPOULOS I.Using Centroids of Word Embeddings and Word Mover's Distance for Biomedical Document Retrieval in Question Answering[J].arXiv:1608.03905,2016.
[2]LU C,ARONSON A,SHOOSHAN S,et al.Spell checker for consumer language (CSpell)[J].Journal of the American Medical Informatics Association,2019,26:211-218.
[3]ALZAHRANI S,SALIM N.Fuzzy semantic-based string similarity for extrinsic plagiarism detection[J].Braschler and Harman,2010,1176:1-8.
[4]KUSNER M,SUN Y,KOLKIN N,et al.From word embeddings to document distances[C]//International Conference on Machine Learning.PMLR,2015:957-966.
[5]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119.
[6]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[7]LIU B.Text sentiment analysis based on CBOW model and deep learning in big data environment[J].Journal of Ambient Intelligence and Humanized Computing,2020,11(2):451-458.
[8]SHAZEER N,PELEMANS J,CHELBA C.Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation[J].arXiv:1412.1454,2015.
[9]TITHI J J,PETRINI F.An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance[J].arXiv:2005.06727,2021.
[10]RUBNER Y,TOMASI C,GUIBAS L J.The Earth Mover's Distance as a Metric for Image Retrieval[J].International Journal of Computer Vision,2000,40(2):99-121.
[11]PEYRÉ G,CUTURI M.Computational optimal transport:With applications to data science[J].Foundations and Trends© in Machine Learning,2019,11(5/6):355-607.
[12]ALVAREZ-MELIS D,JEGELKA S,JAAKKOLA T S.To- wards optimal transport with global invariances[C]//22nd International Conference on Artificial Intelligence and Statistics.PMLR,2019:1870-1879.
[13]CORENFLOS A,THORNTON J,DOUCET A,et al.Differen- tiable Particle Filtering via Entropy-Regularized Optimal Transport[J].arXiv:2102.07850,2021.
[14]FEYDY J,ROUSSILLON P,TROUVÉ A,et al.Fast and scalable optimal transport for brain tractograms[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2019:636-644.
[15]CUTURI M.Sinkhorn Distances:Lightspeed Computation of Optimal Transportation Distances[J].arXiv:1306.0895,2013.
[16]KNIGHT P A.The Sinkhorn-Knopp Algorithm:Convergence and Applications[J].SIAM Journal on Matrix Analysis and Applications,2008,30(1):261-275.
[17]CUDA C++ Best Practices Guide[EB/OL].(2021-07-13). http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html.
[1] 傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文. 一种面向构件化并行应用程序的性能骨架分析方法[J]. 计算机科学, 2021, 48(6): 1-9.
[2] 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化[J]. 计算机科学, 2021, 48(6): 34-40.
[3] 冯凯, 马鑫玉. (n,k)-冒泡排序网络的子网络可靠性[J]. 计算机科学, 2021, 48(4): 43-48.
[4] 李繁, 严星, 张晓宇. 基于GPU的特征脸算法优化研究[J]. 计算机科学, 2021, 48(4): 197-204.
[5] 马梦宇, 吴烨, 陈荦, 伍江江, 李军, 景宁. 显示导向型的大规模地理矢量实时可视化技术[J]. 计算机科学, 2020, 47(9): 117-122.
[6] 陈国良, 张玉杰. 并行计算学科发展历程[J]. 计算机科学, 2020, 47(8): 1-4.
[7] 阳王东, 王昊天, 张宇峰, 林圣乐, 蔡沁耘. 异构混合并行计算综述[J]. 计算机科学, 2020, 47(8): 5-16.
[8] 冯凯, 李婧. k元n方体的子网络可靠性研究[J]. 计算机科学, 2020, 47(7): 31-36.
[9] 刘世芳, 赵永华, 于天禹, 黄荣锋. 广义稠密对称特征问题标准化算法在GPU集群上的有效实现[J]. 计算机科学, 2020, 47(4): 6-12.
[10] 左宪禹, 张哲, 苏岳瀚, 刘扬, 葛强, 田军锋. 基于GPU多流并发并行模型的NDVI提取算法[J]. 计算机科学, 2020, 47(4): 25-29.
[11] 杨宗霖, 李天瑞, 刘胜久, 殷成凤, 贾真, 珠杰. 基于Spark Streaming的流式并行文本校对[J]. 计算机科学, 2020, 47(4): 36-41.
[12] 贾经冬, 张筱曼, 郝璐, 谭火彬. 工业界需求工程关注点分析[J]. 计算机科学, 2020, 47(12): 25-34.
[13] 邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用[J]. 计算机科学, 2020, 47(11A): 425-429.
[14] 徐传福,王曦,刘舒,陈世钊,林玉. 基于Python的大规模高性能LBM多相流模拟[J]. 计算机科学, 2020, 47(1): 17-23.
[15] 徐磊, 陈荣亮, 蔡小川. 基于非结构化网格的高可扩展并行有限体积格子[J]. 计算机科学, 2019, 46(8): 84-88.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王焕文,徐晓刚,徐冠雷,王孝通. 基于阴影不一致的简易人像篡改鉴别[J]. 计算机科学, 2014, 41(Z6): 129 -131 .
[2] 梁俊斌, 马方强, 蒋婵. 动态无线传感网中数据查询技术的研究进展[J]. 计算机科学, 2019, 46(11): 41 -48 .
[3] 崔艳鹏,刘咪,胡建伟. 基于CNN的恶意Web请求检测技术[J]. 计算机科学, 2020, 47(2): 281 -286 .
[4] 陈俊芬,张明,赵佳成. 复杂高维数据的密度峰值快速搜索聚类算法[J]. 计算机科学, 2020, 47(3): 79 -86 .
[5] 崔阳, 刘长红. 基于PIFA的语音识别系统评测平台[J]. 计算机科学, 2020, 47(11A): 638 -641 .
[6] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[7] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[8] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .
[9] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[10] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .