基于LDA的多特征融合的短文本相似度计算

doi:10.11896／j.issn.1002-137X.2018.09.044

Abstract

Abstract: In recent years,latent dirichlet allocation(LDA)topic model provides a new idea for short text similarity calculation by mining the latent semantic themes of text.In view of the sparse features of short text,because the application of LDA theme model may easily lead to inaccurate results of similarity computation,this paper presented a calculation method based on LDA model combining similarity topics factor ST and co-occurrence words factor CW to establish union similarity model.In the protocol of different ST intervals,CW generates constraint or supplementary conditions to ST,and obtains higher accuracy of text similarity.A text clustering experiment was used to verify the method.The experimental results show that the proposed method gains a certain improvement of F measure value

Key words: Co-occurence words, LDA, Short text similarity, Similarity topics, Topic model

CLC Number:

TP391

ZHANG Xiao-chuan, YU Lin-feng, ZHANG Yi-hao. Multi-feature Fusion for Short Text Similarity Calculation Based on LDA[J].Computer Science, 2018, 45(9): 266-270.

0
/ / Recommend

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: https://www.jsjkx.com/EN/10.11896／j.issn.1002-137X.2018.09.044

https://www.jsjkx.com/EN/Y2018/V45/I9/266

References

[1]CROFT D,COUPLAND S,SHELL J,et al.A Fast and Efficient Semantic Short Text Similarity Metric[C]∥2013 13th UK Workshop on Computational Intelligence.2013:221-227.
[2]CHEN P,YANG H,LV P,et al.Research on Text Similarity Based on LDA Model [J].Computer Technology and Development,2016,26(4):82-85.(in Chinese)
陈攀,杨浩,吕品,等.基于LDA模型的文本相似度研究[J].计算机技术与发展,2016,26(4):82-85.
[3]LIU H Z,XU D.Based Ontology Semantic Similarity and Correlation Computing Research [J].Computer Science,2012,39(2):8-13.(in Chinese)
刘宏哲,须德.基于本体的语义相似度和相关度计算研究综述[J].计算机科学,2012,39(2):8-13.
[4]CAO T,ZHOU L,ZHANG G X.A Text Similarity Calculation Based on Co-occurrence Words [J].Computer Engineering and Science,2007,29(3):52-53.(in Chinese)
曹恬,周丽,张国煊.一种基于词共现的文本相似度计算[J].计算机工程与科学,2007,29(3):52-53.
[5]BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet Allocation[J].the Journal of Machine Learning Research,2003,12(3):993-1022.
[6]GibbsLDA++:A C/C++ Implementation of Latent Dirichlet Allocation(LDA) Using Gibbs Sampling for Parameter Estimation and Inference [EB/OL].[2016-05-15].https://sourceforge.net/projects/jgibblda/.
[7]DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Indexing by Latent Semantic Analysis[J].Journal of The American Society for Information Science,1990,41(6):391-407.
[8]HOFMANN T.Probabilistic Latent Semantic Analysis[J].Uncertainty in Artificial Intelligence,1999,56(8):289-296.
[9]ZHANG C,CHEN L,LI Q.A PST_LDA Chinese Text Similarity Calculation Method [J].Computer Application Research,2016,33(2):375-377.(in Chinese)
张超,陈利,李琼.一种PST_LDA中文文本相似度计算方法[J].计算机应用研究,2016,33(2):375-377.
[10]ZHANG Q,WANG H J,WANG L W.Short Text Classification Method Based on Word Vector and LDA[J].Modern Library and Information Technology,2016,32(12):27-35.(in Chinese)
张群,王红军,王伦文.词向量与LDA相融合的短文本分类方法[J].现代图书情报技术,2016,32(12):27-35.
[11]RAMAGE D,DUMAIS S T,LIEBLING D J.Characterizing Mi-croblogs with Topic Models[C]∥International Conference on Weblogs and Social Media.Washington:ICWSM,2010:130-137.
[12]PHAN X H,NGUYEN L M,HORIGUCHI S.Learning to
Classify Short and Sparse Text &Web with Hidden Topics from Large-scale Data Collections[C]∥Proceedings of the 17th International Conference on World Wide Web.ACM,2008:91-100.
[13]LV C Z,JI D H,WU F F.Short Text Classification Based on LDA Feature Extension[J].Computer Engineering and Applications,2015,51(4):123-127.(in Chinese)
吕超镇,姬东鸿,吴飞飞.基于LDA特征扩展的短文本分类[J].计算机工程与应用,2015,51(4):123-127.
[14]HU Y J,JIANG J X,CHANG H Y.Chinese Short Text Classification Based on LDA High Frequency Word Expansion [J].Modern Library and Information Technology,2013,16(6):42-48.(in Chinese)
胡勇军,江嘉欣,常会友.基于LDA高频词扩展的中文短文本分类[J].现代图书情报技术,2013,16(6):42-48.

Related Articles 15

[1]	YU Ben-gong, ZHANG Zi-wei, WANG Hui-ling. TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information [J]. Computer Science, 2022, 49(6A): 165-171.
[2]	WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei. Research on Big Data Governance for Science and Technology Forecast [J]. Computer Science, 2021, 48(9): 36-42.
[3]	LIU Yun-han, SHA Chao-feng, NIU Jun-yu. Analysis of Topics on Database Systems in Stack Overflow [J]. Computer Science, 2021, 48(6): 48-56.
[4]	WEN Jin, ZHANG Xing-yu, SHA Chao-feng, LIU Yan-jun. Test Suite Reduction via Submodular Function Maximization [J]. Computer Science, 2021, 48(12): 75-84.
[5]	MA Li-bo, QIN Xiao-lin. Topic-Location-Category Aware Point-of-interest Recommendation [J]. Computer Science, 2020, 47(9): 81-87.
[6]	ZHOU Bo. Bipartite Network Recommendation Algorithm Based on Semantic Model [J]. Computer Science, 2020, 47(11A): 482-485.
[7]	WANG Sheng, ZHANG Yang-sen, ZHANG Wen, JIANG Yu-ru, ZHANG Rui. Domain Label Acquisition Method Based on SL-LDA Model [J]. Computer Science, 2020, 47(11): 95-100.
[8]	WANG Han, XIA Hong-bin. Collaborative Filtering Recommendation Algorithm Mixing LDA Model and List-wise Model [J]. Computer Science, 2019, 46(9): 216-222.
[9]	JU Ya-ya, YANG Lu, YAN Jian-feng. LDA Algorithm Based on Dynamic Weight [J]. Computer Science, 2019, 46(8): 260-265.
[10]	ZHANG Lei,CAI Ming. Image Annotation Based on Topic Fusion and Frequent Patterns Mining [J]. Computer Science, 2019, 46(7): 246-251.
[11]	FAN Dao-yuan, SUN Ji-hong, WANG Wei, TU Ji-ping, HE Xin. Detection Method of Duplicate Defect Reports Fusing Text and Categorization Information [J]. Computer Science, 2019, 46(12): 192-200.
[12]	JIA Ning, ZHENG Chun-jun. Model of Music Theme Recommendation Based on Attention LSTM [J]. Computer Science, 2019, 46(11A): 230-235.
[13]	YU Yuan-yuan, CHAO Wen-han, HE Yue-ying, LI Zhou-jun. Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding [J]. Computer Science, 2019, 46(1): 238-244.
[14]	QIU Xian-biao, CHEN Xiao-rong. Text Similarity Calculation Algorithm Based on SA_LDA Model [J]. Computer Science, 2018, 45(6A): 106-109.
[15]	HAN Zhao, MIAO Duo-qian, REN Fu-ji. Rough Set Based Knowledge Predicate Analysis of Chinese Knowledge Based Question Answering [J]. Computer Science, 2018, 45(6): 183-186.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multi-feature Fusion for Short Text Similarity Calculation Based on LDA

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0