Computer Science ›› 2019, Vol. 46 ›› Issue (8): 260-265.doi: 10.11896/j.issn.1002-137X.2019.08.043

• Artificial Intelligence • Previous Articles     Next Articles

LDA Algorithm Based on Dynamic Weight

JU Ya-ya, YANG Lu, YAN Jian-feng   

  1. (School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
  • Received:2018-07-14 Online:2019-08-15 Published:2019-08-15

Abstract: The latent Dirichlet allocation (LDA)is a popular three-layer probability topic model,which implements the clustering of words in document and document at the topic level.This model is based on the Bag of Words(BOW) mo-del,and each word has the same importance.It simplifies the complexity of modeling,but makes the topic distributions tend to high-frequency words,which affects the semantic coherence of the topic model.To achieve this goal,an LDA algorithm based on dynamic weight was proposed.The fundamental idea of the algorithm is that each word has different importance.In the iterative process of modeling,word weights are generated dynamically according to the topic distribution of words and feedback to topic modeling,reducing the influence of high frequency words and improving the role of keywords.Experiments on four public datasets show that the LDA algorithm based on dynamic weight can be superior to the current popular LDA inference algorithms in terms of topic semantic coherence,text classification accuracy,gene-ralization performance and precision

Key words: Dynamic weight, Latent dirichlet allocation, Topic model

CLC Number: 

  • TP391
[1]SALTON G,MCGILL M J.Introduction to Modern Information Retrieval [M].New York:McGraw-Hill,1983:239-240.
[2]DEERWESTER S.Indexing by latent semantic analysis [J]. Journal of the American Society for Information Science & Technology,1990,41(6):391-407.
[3]HOFMANN T.Probabilistic latent semantic indexing[C]∥Proceedings of the 22nd Annual International ACM SIGIR Confe-rence on Research and Development in Information Retrieval.New York:IEEE Press,1999:50-57.
[4]HOFFMAN T.Unsupervised learning by probabilistic latent se- mantic indexing [J].Sigir Audit Reports,1999,40(22):28-31.
[5]BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet allocation [J].Journal of Machine Learning Research,2003,3(Jan):993-1022.
[6]LI X,OUYANG J,ZHOU X.Labelset topic model for multi-label document classification [J].Journal of Intelligent Information Systems,2016,46(1):83-97.
[7]WU M S.Modeling query-document dependencies with topic language models for information retrieval [J].Information Sciences,2015,312(C):1-12.
[8]GRIFFITHS T L,STEYVERS M.Finding scientific topics [J].Proceedings of the National academy of Sciences,2004,101(Suppl 1):5228-5235.
[9]LIU X,ZENG J,YANG X,et al.Scalable Parallel EM Algo- rithms for Latent Dirichlet Allocation in Multi-Core Systems[C]∥Proceedings of the 24th International Conference on World Wide Web.Florence,Italy:ACM,2015:669-679.
[10]ZHANG J,ZENG J,YUAN M,et al.LDA Revisited:Entropy,Prior and Convergence [C]∥Proceedings of the 25th ACM International on Conference on Information and Knowledge Ma-nagement.New York:ACM,2016:1763-1772.
[11]MIMNO D,WALLACH H M,TALLEY E,et al.Optimizing Semantic Coherence in Topic Models[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Proces-sing.Association for Computational Linguistics,2010:262-272.
[12]PETTERSON J,SMOLA A,CAETANO T,et al.Word features for Latent Dirichlet Allocation[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2010:1921-1929.
[13]LI X,ZHANG A,LI C,et al.Exploring coherent topics by topic modeling with term weighting [J].Information Processing & Management,2018,54(6):1345-1358.
[14]CHEW P A,CHEW P A.Term weighting schemes for Latent Dirichlet Allocation[C]∥Human Language Technologies:the 2010 Conference of the North American Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,2010:465-473.
[15]NEWMAN D,KARIMI S,CAVEDON L.External evaluation of topic models[C]∥Australasian Document Computing Sympo-sium (ADCS).Sydney,Australia:University of Sydney,2009:1-8.
[16]SHAMS M,BARAANI-DASTJERDI A.Enriched LDA (EL- DA):Combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction [J].Expert Systems with Applications,2017,80(C):136-146.
[17]GEORGE K.Human behavior and the principle of least effort:An introduction to human ecology [M].Boston:Addison-Wesley Press,1949:180-183.
[18]LIN J.Divergence measures based on the Shannon entropy [J]. IEEE Transactions on Information Theory,1991,37(1):145-151.
[19]WU X,ZENG J,YAN J,et al.Finding Better Topics:Features,Priors and Constraints[C]∥Pacific-Asia Conference on Know-ledge Discovery and Data Mining.New York:Springer,2014:296-310.
[20]NEWMAN D,LAU J H,GRIESER K,et al.Automatic evaluation of topic coherence[C]∥The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics.Los Angeles,California:Association for Computational Linguistics,2010:100-108.
[21]CHANG D Y,YAN J F,YANG L,et al.Sliding-window Based Topic Modeling [J].Computer Science,2016,43(12):101-107.(in Chinese) 常东亚,严建峰,杨璐,等.基于滑动窗口的主题模型.计算机科学,2016,43(12):101-107.
[22]GAO Y,YANG L,LIU X S,et al.Study of Semantic Under- standing by LDA [J].Computer Science,2015,42(8):279-282.(in Chinese) 高阳,杨璐,刘晓升,等.LDA语义理解研究[J].计算机科学,2015,42(8):279-282.
[1] LIU Yun-han, SHA Chao-feng, NIU Jun-yu. Analysis of Topics on Database Systems in Stack Overflow [J]. Computer Science, 2021, 48(6): 48-56.
[2] WEN Jin, ZHANG Xing-yu, SHA Chao-feng, LIU Yan-jun. Test Suite Reduction via Submodular Function Maximization [J]. Computer Science, 2021, 48(12): 75-84.
[3] MA Li-bo, QIN Xiao-lin. Topic-Location-Category Aware Point-of-interest Recommendation [J]. Computer Science, 2020, 47(9): 81-87.
[4] PAN Ji-fei,HUANG De-cai. Blockchain Dynamic Sharding Model Based on Jump Hash and Asynchronous Consensus Group [J]. Computer Science, 2020, 47(3): 273-280.
[5] ZHOU Kai, REN Yi, WANG Zhe, GUAN Jian-bo, ZHANG Fang, ZHAO Yan-kang. Classification and Analysis of Ubuntu Bug Reports Based on Topic Model [J]. Computer Science, 2020, 47(12): 35-41.
[6] ZHOU Bo. Bipartite Network Recommendation Algorithm Based on Semantic Model [J]. Computer Science, 2020, 47(11A): 482-485.
[7] WANG Han, XIA Hong-bin. Collaborative Filtering Recommendation Algorithm Mixing LDA Model and List-wise Model [J]. Computer Science, 2019, 46(9): 216-222.
[8] ZHANG Lei,CAI Ming. Image Annotation Based on Topic Fusion and Frequent Patterns Mining [J]. Computer Science, 2019, 46(7): 246-251.
[9] FAN Dao-yuan, SUN Ji-hong, WANG Wei, TU Ji-ping, HE Xin. Detection Method of Duplicate Defect Reports Fusing Text and Categorization Information [J]. Computer Science, 2019, 46(12): 192-200.
[10] JIA Ning, ZHENG Chun-jun. Model of Music Theme Recommendation Based on Attention LSTM [J]. Computer Science, 2019, 46(11A): 230-235.
[11] YU Yuan-yuan, CHAO Wen-han, HE Yue-ying, LI Zhou-jun. Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding [J]. Computer Science, 2019, 46(1): 238-244.
[12] ZHANG Xiao-chuan, YU Lin-feng, ZHANG Yi-hao. Multi-feature Fusion for Short Text Similarity Calculation Based on LDA [J]. Computer Science, 2018, 45(9): 266-270.
[13] SUN Jin-guang, RONG Wen-zhao. Research on Regional Age Estimation Model [J]. Computer Science, 2018, 45(8): 41-49.
[14] QIU Xian-biao, CHEN Xiao-rong. Text Similarity Calculation Algorithm Based on SA_LDA Model [J]. Computer Science, 2018, 45(6A): 106-109.
[15] DONG Chen-lu and KE Xin-sheng. Study on Collaborative Filtering Algorithm Based on User Interest Change and Comment [J]. Computer Science, 2018, 45(3): 213-217.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!