Computer Science ›› 2013, Vol. 40 ›› Issue (6): 256-259.

Previous Articles     Next Articles

Implementation of Bayesian Inference on MCDB Distributed System

ZHOU Zhi-min and GAO Shen-yong   

  • Online:2018-11-16 Published:2018-11-16

Abstract: This paper described how the Monte Carlo database system (MCDB) can be used to easily implement Baye-sian inference via Markov chain Monte Carlo (MCMC) over very large datasets.Linear Bayesian regression,LDA and Dirichlet clustering were used as examples to demonstrate this task.To implement an MCMC simulation in MCDB,a programmer specifies dependencies among variables and how they parameterize one another using the SQL language.This paper devised a simple scheme for developing large scale machine learning systems with SQL,which with the help of MCDB,can automaticly deal with parallelization and optimization problems,to achieve high efficiency in computation.

Key words: Bayesian inference,Parallel algorithms,SQL,Distributed system

[1] Drost I,Dunning T,Eastman J,et al.Introduction to ApacheMahout [Z].mahout.apache.org.2011
[2] Lunn D,Spiegelhalter D,Thomas A,et al.The BUGS project:Evolution,critique and future directions [J].Statist.Med.,2009,28(25):3049-3067
[3] Jampani R,Xu Fei,Wu Ming-xi,et al.The Monte Carlo Database System:Stochastic analysis close to the data [J].ACM Trans.Database Syst.,2011,36(3):18
[4] Singh S,Subramanya A,Pereira F,et al.Distributed MAP inference for undirected graphical models [C]∥Neural Information Processing Systems (NIPS),Workshop on Learning on Cores,Clusters and Clouds.2010
[5] Cai Z,Vagena Z,Jermaine C,et al.Very Large Scale Baye-sian Inference Using MCDB [C]∥Big Learn Workshop,Neural Information Processing Systems.2011
[6] Blei D M,Ng A Y,Jordan M I.Latent Dirichlet Allocation [J].Journal of Machine Learning Research,2003,3:993-1022
[7] Porteous I,Newman D,Ihler A T,et al.Fast collapsed Gibbssampling for Latent Dirichlet Allocation [C]∥ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2008:569-577
[8] Liu Zhi-yuan,Zhang Yu-zhou,Chang E Y,et al.Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Proces-sing [J].ACM Transactions on Intelligent Systems and Techno-logy,special issue on Large Scale Machine Learning,2011,2(3):26
[9] Smola A J,Narayanamurthy S.An Architecture for ParallelTopic models [J].The Proceedings of the VLDB Endowment,2010,3(1):703-710
[10] Newman D,Asuncion A,Smyth P.et al,Distributed Inference for Latent Dirichlet Allocation [C]∥Neural Information Processing Systems.2007
[11] 张步良.基于分类概率加权的朴素贝叶斯分类方法[J].重庆理工大学学报:自然科学版,2012,26(7):81-83

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!